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PREFACE 


The purpose of this work. This book presents a new approach to the 
old problem of induction and probability. The theory here developed is 
characterized by the following basic conceptions: (1) all inductive reason- 
ing, in the wide sense of nondeductive or nondemonstrative reasoning, is 
reasoning in terms of probability; (2) hence inductive logic, the theory of 
the principles of inductive reasoning, is the same as probability logic; (3) 
the concept of probability on which inductive logic is to be based is a 
logical relation between two statements or propositions; it is the degree of 
confirmation of a hypothesis (or conclusion) on the basis of some given 
evidence (or premises) ; (4) the so-called frequency concept of probability, 
as used in statistical investigations, is an important scientific concept in 
its own right, but it is not suitable as the basic concept of inductive logic; 
(5) all principles and theorems of inductive logic are analytic; (6) hence 
the validity of inductive reasoning is not dependent upon any synthetic 
presuppositions like the much debated principle of the uniformity of the 
world. One of the tasks of this book is the discussion of the general philo- 
sophical problems concerning the nature of probability and inductive rea- 
soning, which will lead to the conceptions just mentioned. However, the 
major aim of the book extends beyond this. It is the actual construction of 
a system of inductive logic, a theory based on the conceptions indicated 
but supplying proofs for many theorems concerning such concepts as the 
quantitative concept of degree of confirmation, relevance and irrelevance, 
the (comparative) concept of stronger confirmation, and a general method 
of estimation. This system will be constructed with the help of the meth- 
ods of symbolic logic and semantics. (However, previous knowledge of 
these fields is not necessarily required; all symbols and technical terms- 
used will be explained in this book.) In this way it will for the first time be 
possible to construct a system of inductive logic that can take its rightful 
place beside the modern, exact systems of deductive logic. The system to 
be constructed here is not yet applicable to the entire language of science 
with its quantitative magnitudes like mass, temperature, etc., but only to 
a language system that is much simpler (corresponding to what is known 
technically as lower functional logic including relations and identity) 
though more comprehensive than the language to which deductive logic 
was restricted for more than two thousand years, from Aristotle to Boole. 
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Since this book seeks to combine various purposes, it contains material 
of various kinds. In preparation for the construction of a new system of 
inductive logic, general discussions of a philosophical or methodological 
nature are given (in chaps. i, ii, and iv); their purpose is argument 
and clarification; they are intended to lead to an understanding of 
the basic conception of the nature of probability and induction which is 
here accepted as a foundation for the construction of the system. The 
second part of the book (chaps. v-ix) carries out the construction of the 
system. This part contains less argumentation; it proceeds more geo- 
metrico, by the technical steps of definitions and proofs for theorems. One 
purpose of this part is to show by example what kinds of problems can be 
dealt with and solved in these fundamental parts of inductive logic. The 
other purpose lies in the results themselves. Many of the theorems (espe- 
cially in chaps. v and viii) are known from the classical theory of probabil- 
ity; the purpose of restating them here lies in their more exact formulation 
and interpretation and in their proofs within the new framework. Many 
other theorems are stated and proved here for the first time (especially in 
chaps. vi, vii, and ix). Many of the theorems (both of deductive logic in 
chap. iii and of inductive logic in later chapters) are listed chiefly for refer- 
ence purposes; they are not meant to be read through all at once. The 
reader will easily find those items that are of interest to him. To aid him, 
each chapter and each section is preceded by a summary (I often wonder 
why many of the books I have to read do not help me in the same way; 
could it be that the authors wish to compel me to read every word they 
have written?); the most important definitions and theorems are marked 
by ‘+’; many theorems are accompanied by brief remarks in nontechnical 
language indicating their contents and functions. Material not absolutely 
necessary for an understanding of the main text is printed in small type, 
e.g., digressions into more technical problems, examples, proofs, references 
to other authors, etc. A glossary is given near the end of the book, provid- 
ing informal explanations for the technical terms most frequently used. 
(The theorems and definitions in this book are labeled for reference pur- 
poses in the following manner. Each theorem carries a mark like ‘T 20-5’, 
meaning ‘theorem No. 5 in § 20’; a theorem often contains parts marked 
by letters ‘a’, ‘b’, etc. Definitions are labeled in a similar way with ‘D’ 
instead of ‘T’. A reference ‘Tsc’ occurring in § 20 is meant to refer to 
T20-5, part c. The numbers assigned to the sections are not always con- 


secutive; sometimes a number has been omitted in order to make possible > 


later insertions; the same holds for the numbers of theorems within a sec- 
tion, and for the letters assigned to parts within a theorem.) 
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A reader who is chiefly interested in general philosophical problems and 
less in technical developments might first read chapters i, ii, and iv and 
then the following sections in other chapters: §§ 14-20 on the form and 
semantics of our language systems; §§ 79-81 on the comparative concept 
of confirmation; §§ 86-88 on the concept of confirming evidence; §§ 98- 
100 on estimation. The reader who is familiar with the classical theory of 
probability or with a modern theory based on the classical conception, and 
who wants to find out the relation between our theory and the classical 
one, is referred to chapters ii, iv, and viii. An adherent of the frequency 
conception of probability in the form of either R. von Mises or H. Reichen- 
bach might be interested in chapters ii and iv (esp. §§ 41-44). If a reader 
acquainted with the methods of modern mathematical statistics, e.g., of 
the school of R. A. Fisher or of that of J. Neyman, E. S. Pearson, and 
A. Wald, is looking for a logical foundation of statistical inference, testing 
of hypotheses, and estimation, he might read chapters ii (esp. §§ 9 and 
to), iv (esp. §§ 41-44, 50, 51), viii (§§ 94-96), and, above all, ix (esp. 
§§ 98-100). If somebody is interested, from the point of view either of 
applied ethics or of mathematical economics, in the problem as to how a 
rational agent should determine his practical decisions and what function 
inductive logic has in this context, he is referred to §§ 50 and 51. 


The present volume is the first in a projected two-volume work, Probability and 
Induction. It begins with a brief introductory chapter which does not deal with prob- 
ability but with the general problem of explication, that is, the task of finding an 
exactly defined concept, an “explicatum”, to take the place of a given concept, the 

“explicandum”, which is in practical use but not yet defined exactly. One of the main 
tasks of any new theory of probability is to supply adequate explicata for the concept 
of probability and for the methods of inductive reasoning which are at present applied 
in science and statistics. However, there does not seem to be sufficient clarity and 
agreement concerning the requirements that an adequate explicatum for any ex- 
plicandum must fulfil. Therefore it seemed advisable to include a chapter on explication 
in this book, although this topic should be dealt with more appropriately in a book 
on concept formation in science. Chapter iii lies likewise outside the field of probability. 
It gives a survey of those parts of deductive logic which are needed as a basis for our 
construction of inductive logic. But the particular form of deductive logic here chosen 
may also be of interest in its own right. The system here constructed does not have the 
customary form of a logical calculus, based on primitive sentences and rules of infer- 
ence; it takes the form of an interpreted system. Therefore the theory of the system 
does not belong to the field known as logical syntax but to that of semantics. The basic 
concepts of deductive logic, e.g., logical truth and logical implication, are here expli- 
cated as semantical concepts, defined with the help of state-descriptions, i i.e., sentences 
describing possible worlds, and the concept of the range of a sentence, i.e., the class of 
those state-descriptions in which the sentence holds. Chapter iii serves as an introduc- 
tion to this new semantical method of dealing with deductive logic, and, furthermore, 
it provides a comprehensive collection of theorems for the purpose of reference in later 


proofs of theorems in inductive logic. 
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Chapters ii and iv contain detailed general discussions on probability and induction. 
It is shown in chapter ii that the term ‘probability’ as used by scientists covers two 
quite different explicanda, called here ‘probability,’ and ‘probability. The former 
characterizes the status of any scientific hypothesis, e.g., a prediction or a law, with 
respect to given evidence; this concept is explicated by the concept of degree of con- 
firmation, which will serve as the basic concept of inductive logic. The concept of prob- 
ability, means the relative frequency of a kind of event in a long sequence of events. 
This concept is used in science and statistics for the description and statistical analysis 
of mass phenomena. Since both concepts are useful and practically indispensable for 
science, it is important that explications be given and theories developed for both of 
them. Therefore it seems to me that the long and violent controversy between the 
“frequency school” and the “logical school’ of probability over the question as to 
which of the two camps is in possession of “the right conception of probability” does 
not serve any useful purpose. Chapter iv discusses further the nature and meaning of 
the logical concept of probability, and the problems and difficulties involved in finding 
a concept of degree of confirmation as a quantitative explicatum for probability,. 
Assuming that it were possible to find an explicatum of this kind and to construct a 
system of inductive logic on its basis, the questions of the usefulness of such a system 
both for the theoretical purposes of science and for the practical purposes of deter- 
mining the best decisions for action in given situations are discussed. In the latter con- 
text, the utilization of estimates of unknown values of magnitudes is analyzed and, in 
particular, the rule of maximizing the estimated utility resulting from a chosen course 
of action. This discussion intends to clarify a problem that is of much concern in con- 
temporary mathematical economics. 

The second part of this book, consisting of chapters y-ix, contains a technical con- 
struction of the fundamental parts of inductive logic based on the general conceptions 
developed in the first part. First, the concept of a confirmation function, ‘c-function’ 
for short, is introduced. This is a numerical function which assigns a real number 
between o and 1 to any pair of sentences. If ¢ is a function of this kind, ‘c(A,e) =r’ 
means: ‘the degree of confirmation of the hypothesis k, on the basis of the evidence e, 
is r’. The class of regular c-functions is defined as an infinite, very comprehensive class 
of functions of the kind described. The most fundamental part of inductive logic con- 
sists of those theorems which hold for all regular c-functions (chap. v); among them is 
the famous and much debated theorem of Bayes. Later (in chap. viii), theorems for a 
narrower class of functions are proved, the so-called symmetrical c-functions. Among 
the latter theorems are two of the most important results of the classical theory of 
probability, viz., the binomial law and Bernoulli’s theorem; but in the context of our 
theory the interpretation of these theorems is modified. If a new item of information is 
added to the available body of evidence, then the degree of confirmation for a given 
hypothesis either increases, or decreases, or remains unchanged. The new information 
is then called positively relevant for the hypothesis, or negatively relevant, or ir- 
relevant, respectively. Theorems are developed concerning these relevance concepts, 
and also concerning a quantitative measure of relevance which represents the amount 
of positive or negative relevance (chap. vi). 

Some students of probability believe that the logical concept of probability, cannot 
be explicated by a quantitative concept of degree of confirmation, i.e., one with nu- 
merical values. The best we can hope for, they think, is to find an explicatum of com- 
parative form, e.g., ‘the hypothesis # is more strongly confirmed by the evidence e than 
h’ is by e”. Although I do not share this skeptical view, I think that a comparative 
concept of confirmation is of interest. A definition for a concept of this kind is given 
which does not involve any quantitative concepts, and a system of comparative induc- 
tive logic is constructed on the basis of this definition (chap. vii). The final chapter of 
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this volume (chap. ix) investigates the problem of estimation. This problem belongs to 
the most important problems connected with inductive reasoning. The investigations 
of contemporary statisticians concerning sampling and estimation have led to many 
interesting and fruitful results. However, there is no agreement among them concern- 
ing the logical nature of estimation and the validity of particular methods of estima- 
tion. A new approach to the problem is here proposed within the framework of our sys- 
tem of inductive logic. A general estimate-function is defined with the help of the con- 
cept of degree of confirmation. This procedure supplies the needed logical foundation 
for a general theory of estimation. Then, in particular, the application of the general 
estimate-function for the estimation of frequencies is investigated. 

Some new books discussing the problems of probability and induction were pub- 
lished in recent years, after the writing of the manuscript of the present volume was 
finished. They are therefore not discussed here, or only briefly. The most important 
ones are those by William Kneale, C. I. Lewis, and Bertrand Russell (see Bibliogra- 
phy). I am especially gratified by the great similarity between the conceptions of the 
nature of the logical concept of probability which were developed independently by 
Lewis and myself. Lewis does not try to construct a technical explication of probabil- 
ity; but he gives a detailed and thoroughgoing analysis of the role of probability in the 
whole system of our empirical knowledge and, in particular, in the interpretation and 
confirmation of statements about the world of things in terms of expectations concern- 
ing future observations. This analysis, which connects probability and epistemology 
more intimately than has been done so far by philosophers, is a very valuable help in 
the clarification of contemporary discussions in both fields. 

The second volume, now in preparation, will have chiefly two tasks. The first will 
be to continue the construction of inductive logic begun in this volume. While the 
theorems here developed refer to general classes of c-functions, in the second volume 
one particular c-function, symbolized by ‘c™’, will be selected as our quantitative 
explicatum for probability;, our representative of the concept of degree of confirma- 
tion. The theorems in the present volume can only be of conditional form (e.g., the 
special addition principle: “if a c-function has the values r, and rz, respectively, for two 
incompatible hypotheses /, and /, on the basis of the evidence e, then it has the value 
Tı + ra for the disjunction /; V ka on the same evidence”). On the other hand, it will 
be possible to state theorems concerning the function c*, proved on the basis of its 
definition, which enable us actually to compute the value of this function for any two 
given sentences (within our simple language systems). A brief summary of the theory 
of c*, stating the definition and some of the theorems, is given in the Appendix to the 
present volume, § 110. It is not claimed that c* is necessarily the best explicatum pos- 
sible. The theory of this function will be developed chiefly for the purpose of presenting 
a concrete example of a quantitative system of inductive'logic which is complete (with 
respect to the simple language systems chosen). Furthermore, the results found for c* 
give occasion for discussions of general problems concerning inductive logic. Thus, for 
example, the problem of the confirmation of a universal law on the basis of a finite 
number of observational results will be discussed in detail in this context; and also the 
question whether a scientific, inductive procedure leading to a prediction of a single 
event must necessarily involve universal laws, as is usually assumed. 

The second main task of the second volume will be to develop general procedures for 
comparing the goodness of inductive methods. The procedures are general in the sense 
of being applicable not only to certain methods that have actually been proposed 
(among them we shall discuss, e.g., Laplace’s rule of succession, R. A. Fisher’s maxi- 
mum likelihood method of estimation, Reichenbach’s rule of induction, our system of 
c*, and others) but also to any other inductive methods that might be proposed or con- 
sidered. The comparisons will be made not with respect to the reasons offered for the 
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choice of an inductive method by its author but rather with respect to the results to 
which the methods lead; more specifically, we shall examine in this context not the 
philosophical soundness of the basic conceptions underlying any given inductive meth- 
od but rather the successful application of the given method in competition with an- 
other method. We might, for example, consider a possible universe with a given struc- 
ture, represented by a state-description; we imagine that two men as representatives of 
two different given inductive methods make a comprehensive system of wagers. Each 
of these wagers is based on the common knowledge of some part of the assumed world 
and refers to a hypothesis concerning an unknown individual; and each wager is made 
in such a way that it is judged by each of the two men as favorable to himself from the 
point of view of his inductive method. On the basis of the given state-description we 
can determine for each wager who of the two men wins; hence we can calculate the 
over-all balance for the total system of wagers, by which all parts of the world are 
covered in turn. Carrying out this procedure for all possible worlds, i.e., state-descrip- 
tions, we shall determine in which of them the one inductive method is more successful 
and in which the other. We shall find that for any. two given inductive methods, no 
matter how inadequate the first may appear to us in comparison with the second, there 
are always some state-descriptions in which the first wins out against the second. 
Hence we can never say of one method that it is absolutely inferior to another method 
in the sense of being inferior in every conceivable world. Nevertheless, the result of a 
comparison of two inductive methods in the manner indicated may practically influ- 
ence our preference. Suppose, for example, that in comparing two given inductive 
methods we find that the number of those state-descriptions in which the second meth- 
od is more successful is a million times as large as the number of those in which the first 
method is more successful. Then it may well be that this result would influence us 
against regarding the first method as more adequate than the second and against 
choosing the first in preference to the second for determining our practical decisions in 
the actual world, whose total structure is not known to us and for which we therefore 
cannot know which of the two inductive methods would be more successful in the long 
run. ` 
The discussion of procedures for comparing the success of given inductive methods 
will naturally lead to the question whether an investigation of this kind must neces- 
sarily be restricted to the few known inductive methods or whether it can be general- 
ized. The known methods are, so to speak, arbitrarily selected by historical accident 
from the totality of possible inductive methods. This totality is not a system of discrete 
entities but a continuum. If we could characterize each method by a few, say n, char- 
acteristic numbers or parameters, then each method would be represented by a point in 
an n-dimensional continuous space. This would enable us to develop a general theory of 
inductive methods in a simple form. We might then, for example, investigate the 
changes which a given inductive method would undergo if we changed its parameter 
values in a certain way. A system of this kind will be developed. It will contain, though 
not all conceivable inductive methods, still a very comprehensive infinite class of them, 
including all known inductive methods (among them those mentioned above) and all 
those others which are even remotely similar in their general structure to those known. 
Tt will turn out, surprisingly, that this can be done with the help of only two parame- 
ters; with respect to any given fixed language system, one parameter is sufficient. This 
parameter will be denoted by ^’; and the system of inductive methods will be called 
the \-system. If a language system Ì is given, then any inductive method for £ is com- 
pletely characterized by its \-value in the following sense: the one number À deter- 
mines uniquely the degree of confirmation of any hypothesis with respect to any evi- 
dence expressible in and the estimate for the relative frequency of a property in a class 
‘of individuals on the basis of any evidence expressible in &; in other words, two induc- 
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tive methods have the same A only if they always lead to the same numerical results 
of the kinds just described. The d-system will enable us to analyze in a relatively 
simple way various inductive methods which we want to consider. Furthermore, it 
becomes possible to solve problems of a new kind, viz., to construct inductive methods 
which are most suitable for given purposes. For example, suppose that a description of 
a possible world representing a certain structure is given; suppose further that we 
choose some procedure for measuring the successfulness of inductive methods within 
possible worlds (e.g., by the over-all balance of a system of wagers covering the whole 
world, as previously indicated, or by determining the errors of estimations of relative 
frequency in many classes covering again the whole world). The measure of over-all 
success S for the given world will depend upon the inductive method applied. Since 
now each inductive method is completely characterized by its A, we can represent S 
as a function of À alone: S(A). Then it is easily possible to determine that value of X 
for which SA) has its maximum; in other words, to construct that particular inductive 
method which is most successful for the given world. The surprising fact that this and 
similar problems can now be solved in this simple way is a consequence of the applica- 
tion of the A-system, in which the various inductive methods are no longer regarded as 
separate entities with incomparable features but as elements in a continuum that is 
numerically controlled. 

The second volume will also contain investigations of various other problems, espe- 
cially those connected with the task of extending our system of inductive logic to richer 
language systems and finally to the whole quantitative language of science. For most 
of these problems no complete solutions will be offered. A tentative solution will be 
proposed for the first step in the extension, viz., a language system in which the indi- 
viduals belong to a discrete linear order, which may be regarded as a temporal sequence 
of events (cf. § 15B). In a system of this kind new inductive problems arise because 
regularities of temporal succession become relevant for the degree of confirmation. The 
concept of random order for a system of this kind will be defined, or rather the quanti- 
tative concept of the degree of randomness of a given order, and its opposite, the degree 
of uniformity. This will lead to a new definition of degree of confirmation suitable for 
the extended system. With the help of these concepts it will be possible to formulate 
and discuss the problem of the assumption of the uniformity of the world and its al- 
leged necessity for the validity of inductive reasoning in a more exact way than in the 
present volume (§ 41F). - s 
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sign of equality is blurred so that the subscript looks like ‘p — 1’ instead of 
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CHAPTER I 


ON EXPLICATION 


After a brief indication of the problems to be dealt with in this book—the 
problems of degree of confirmation, induction, and probability (§ 1)—the re- 
mainder of this chapter contains a discussion of some general questions of a 
methodological nature. By an explication we understand the transformation of 
an inexact, prescientific concept, the explicandum, into an exact concept, the 
explicatum (§ 2). The explicatum must fulfil the requirements of similarity to 
the explicandum, exactness, fruitfulness, and simplicity (§ 3). Three kinds of 
concepts are distinguished: classificatory (e.g., Warm), comparative (eg., 
Warmer), and quantitative concepts (e.g., Temperature) (§ 4). The role of com- 
parative and quantitative concepts as explicata is discussed (§ 5). The axiomat- 
ic method is briefly characterized, and the distinction between its two phases, 
formalization and interpretation, is especially emphasized (§ 6). In this chap- 
ter the methodological questions are discussed in a general way, without refer- 
ence to the specific problems of this book. Only in later chapters will the results 
of these preliminary explanations be applied in the discussions concerning con- 
firmation and probability. 


§ 1. Introduction: Our Problems 


A brief, preliminary indication is given of the tasks which this book will try 
to solve: a clarification of (1) degree of confirmation, (2) induction, (3) prob- 
ability. 


The chief tasks of this book will be: 
1) a clarification and, if possible, a definition of the concept of degree of 
confirmation; 
2) a clarification of the logical nature of induction and, if possible, a con- 
struction of a system of inductive logic; 
3) a clarification of the concept of probability. 
At the present only a few preliminary explanations of these problems will 


be given. 

1. When scientists speak about a scientific law or a theory, or also a 
singular statement, for example, a prediction, on the one hand, and cer- 
tain observational data or experimental results, on the other, they often 
state a relation between those items in forms like these: 

a. ‘This experiment again confirms the theory T’ (or: ‘. . . supplies 
new evidence for . . .’). 
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b. ‘The quantum theory is confirmed to a considerably higher degree 
by the experimental data known today than by those available twenty 
years ago’ (or: ‘. . . is supported more strongly by . . .’). 

The concepts of confirming evidence or degree of confirmation used in 
statements of this kind are usually sufficiently well understood for simple, 
practical purposes, but they are hardly ever precisely explained. It will be 
one of the chief tasks of this book to make concepts of this kind precise 
and to furnish a theory of the logical relations between any hypothesis 
and any piece of knowledge that might be regarded as confirming evi- 
dence for the hypothesis. 

2. The problem of induction in the widest sense—concerning a hy- 
pothesis of any, not necessarily universal form—is essentially the same 
as the problem of the logical relation between a hypothesis and some con- 
firming evidence for it. Thus, by laying down a definition for the concept 
of degree of confirmation and constructing a logical theory based upon 
this concept, we shall furnish a system of inductive logic. While deductive 
logic may be regarded as the theory based upon the concept of logical con- 
sequence or deducibility, inductive logic is the theory based upon what 
might be called the degree of inducibility, that is, the degree of confirma- 
tion. 

3. The problem of probability is likewise closely related to that of in- 
duction. This has often been observed, at least with respect to one of the 
various conceptions of probability which we find in the historical develop- 
ment (sometimes called inductive probability). We shall try to show that 
we have to distinguish chiefly two concepts of probability; the one is de- 
fined in terms of frequency and is applied empirically, the other is a logical 
concept and is the same as degree of confirmation. It will be shown that 
both are important for the method of science, and thus the controversy 
between the two “conceptions” of probability will be dissolved. 

Thus we see that one or several of the problems which we intend to ap- 
proach have the following character. There is a certain term (‘confirming 
evidence’, ‘degree of confirmation’, ‘probability’) which is used in every- 
day language and by scientists without being exactly defined, and we try 
to make the use of these terms more precise or, as we shall say, to give an 
explication for them. The task of explication is of very general importance 
for the construction of concepts. Therefore we shall devote the remainder 
of this chapter (§§ 2-6) to a discussion of the general nature of the method 
of explication and only in the next chapter (§ 8) return to our specific 
problems of confirmation and probability. 
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§ 2. On the Clarification of an Explicandum 


By the procedure of explication we mean the transformation of an inexact, 
prescientific concept, the explicandum, into a new exact concept, the explica- 
tum. Although the explicandum cannot be given in exact terms, it should be 
made as clear as possible by informal explanations and examples. 


The task of explication consists in transforming a given more or less 
inexact concept into an exact one or, rather, in replacing the first by the 
second. We call the given concept (or the term used for it) the explican- 
dum, and the exact concept proposed to take the place of the first (or 
the term proposed for it) the explicatum. The explicandum may belong 
to everyday language or to a previous stage in the development of scien- 
tific language. The explicatum must be given by explicit rules for its use, 
for example, by a definition which incorporates it into a well-constructed 
system of scientific either logicomathematical or empirical concepts. 


The term ‘explicatum’ has been suggested by the following two usages. Kant 
calls a judgment explicative if the predicate is obtained by analysis of the sub- 
ject. Husserl, in speaking about the synthesis of identification between a con- 
fused, nonarticulated sense and a subsequently intended distinct, articulated 
sense, calls the latter the ‘Explikat’ of the former. (For both uses see Dictionary 
of philosophy [1942], ed. D. Runes, p. 105). What I mean by ‘explicandum’ and 
‘explicatum’ is to some extent similar to what C. H. Langford calls ‘analysan- 
dum’ and ‘analysans’: “the analysis then states an appropriate relation of 
equivalence between the analysandum and the analysans” (“The notion of 
analysis in Moore’s philosophy”, in The philosophy of G. E. Moore [1943], ed. 
P. A. Schilpp, pp. 321-42; see p. 323); he says that the motive of an analysis 
“is usually that of supplanting a relatively vague idea by a more precise one” 
(ibid., p. 329). i : y s i 

(Perhaps the form ‘explicans’ might be considered instead of ‘explicatum’; 
however, I think that the analogy with the terms ‘definiendum’ and ‘definiens’ 
would not be useful because, if the explication consists in giving an explicit 
definition, then both the definiens and the definiendum in this definition express 
the explicatum, while the explicandum does not occur.) The procedure of ex- 
plication is here understood in a wider sense than the procedures of analysis and 
clarification which Kant, Husserl, and Langford have in mind. The explicatum 
(in my sense) is in many cases the result of an analysis of the explicandum (and 
this has motivated my choice of the terms); in other cases, however, it deviates 
deliberately from the explicandum but still takes its place in some way; this 
will become clear by the subsequent examples. 


A problem of explication is characteristically different from ordinary 
scientific (logical or empirical) problems, where both the datum and the 
solution are, under favorable conditions, formulated in exact terms (for 
example, ‘What is the product of 3 and 5?”, ‘What happens when an elec- 
tric current goes through water?’). In a problem of explication the datum, 
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viz., the explicandum, is not given in exact terms; if it were, no explication 
would be necessary. Since the datum is inexact, the problem itself is not 
stated in exact terms; and yet we are asked to give an exact solution. This 
is one of the puzzling peculiarities of explication. It follows that, if a solu- 
tion for a problem of explication is proposed, we cannot decide in an exact 
way whether it is right or wrong. Strictly speaking, the question whether 
the solution is right or wrong makes no good sense because there is no 
clear-cut answer. The question should rather be whether the proposed 
solution is satisfactory, whether it is more satisfactory than another one, 
and the like. What is meant by these questions will soon be made clearer. 

Before we turn to the chief question, viz., what are the requirements for 
a satisfactory solution of a problem of explication, that is to say, for a 
satisfactory explicatum, let us look somewhat more at the way in which 
the problem is to be stated, that is, how the explicandum is to be given. 
There is a temptation to think that, since the explicandum cannot be given 
in exact terms anyway, it does not matter much how we formulate the 
problem. But this would be quite wrong. On the contrary, since even in 
the best case we cannot reach full exactness, we must, in order to prevent 
the discussion of the problem from becoming entirely futile, do all we can 
to make at least practically clear what is meant as the explicandum. What 
X means by a certain term in contexts of a certain kind is at least prac- 
tically clear to F if F is able to predict correctly X’s interpretation for 
most of the simple, ordinary cases of the use of the term in those contexts. 
It seems to me that, in raising problems of analysis or explication, philoso- 
phers very frequently violate this requirement. They ask questions like: 
‘What is causality?’, ‘What is life?’, ‘What is mind?’, ‘What is justice?’, 
etc. Then they often immediately start to look for an answer without first 
examining the tacit assumption that the terms of the question are at least 
practically clear enough to serve as a basis for an investigation, for an 
analysis or explication. Even though the terms in question are unsystem- 
atic, inexact terms, there are means for reaching a relatively good 
mutual understanding as to their intended meaning. An indication of the 
meaning with the help of some examples for its intended use and other 
examples for uses not now intended can help the understanding. An in- 
formal explanation in general terms may be added. All explanations of 
this kind serve only to make clear what is meant as the explicandum; they 
do not yet supply an explication, say, a definition of the explicatum; they 
belong still to the formulation of the problem, not yet to the construction 
of an answer. (Examples. 1. I might say, for example: “I mean by the ex- 
plicandum ‘salt’, not its wide sense which it has in chemistry but its nar- 
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row sense in which it is used in the household language”. This explana- 
tion is not yet an explication; the latter may be given, for instance, by the 
compound expression ‘sodium chloride’ or the synonymous symbol ‘NaCl’ 
of the language of chemistry. 2. “I am looking for an explication of the 
term ‘true’, not as used in phrases like ‘a true democracy’, ‘a true friend’, 
etc., but as used in everyday life, in legal proceedings, in logic, and in sci- 
ence, in about the sense of ‘correct’, ‘accurate’, ‘veridical’, ‘not false’, 
‘neither error nor lie’, as applied to statements, assertions, reports, stories, 

tc.” This explanation is not yet an explication; an explication may be 
given by a definition within the framework of semantical concepts, for 
example, by Tarski’s definition of ‘true’ in [Wahrheitsbegriff] (for abbre- 
viated titles in square brackets see the Bibliography at the end of this vol- 
ume), or by D17-x below. By explanations of this kind the reader may 
obtain step by step a clearer picture of what is intended to be included 
and what is intended to be excluded; thus he may reach an understanding 
of the meaning intended which is far from perfect theoretically but may 
be sufficient for the practical purposes of a discussion of possible explica- 
tions. 


§ 3. Requirements for an Explicatum 
A concept must fulfil the following requirements in order to be an adequate 
explicatum for a given explicandum: (r) similarity to the explicandum, (2) 
exactness, (3) fruitfulness, (4) simplicity. 

Suppose we wish to explicate a certain prescientific concept, which has 
been sufficiently clarified by examples and explanations as just discussed. 
What is the explication of this concept intended to achieve? To say that 
the given prescientific concept is to be transformed into an exact one 
means, of course, that an exact concept corresponding to the given con- 
cept is to be introduced. What kind of correspondence is required here be- 
tween the first concept, the explicandum, and the second, the explicatum? 

Since the explicandum is more or less vague and certainly more so than 
the explicatum, it is obvious that we cannot require the correspondence 
between the two concepts to be a complete coincidence. But one might 
perhaps think that the explicatum should be as close to or as similar with 
the explicandum as the latter’s vagueness permits. However, it is easily 
seen that this requirement would be too strong, that the actual procedure 
of scientists is often not in agreement with it, and for good reasons. Let us 
consider as an example the prescientific term ‘fish’. In the construction of 
a systematic language of zodlogy, the concept Fish designated by this 
term has been replaced by a scientific concept designated by the same term 
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‘fish’; let us use for the latter concept the term ‘piscis’ in order to avoid 
confusion. When we compare the explicandum Fish with the explicatum 
Piscis, we see that they do not even approximately coincide. The latteris 
much narrower than the former; many kinds of animals which were sub- 
sumed under the concept Fish, for instance, whales and seals, are ex- 
cluded from the concept Piscis. [The situation is not adequately described 
by the statement: ‘The previous belief that whales (in German even called 
‘Walfische’) are also fish is refuted by zodlogy’. The prescientific term 
‘fish’ was meant in about the sense of ‘animal living in water’; therefore 
its application to whales, etc., was entirely correct. The change which zo- 
ologists brought about in this point was not a correction in the field of 
factual knowledge but a change in the rules of the language; this change, 
it is true, was motivated by factual discoveries.] That the explicandum 
Fish has been replaced by the explicatum Piscis does not mean that the 
former term can always be replaced by the latter; because of the differ- 
ence in meaning just mentioned, this is obviously not the case. The former 
concept has been succeeded by the latter in this sense: the former is no 
longer necessary in scientific talk; most of what previously was said with 
the former can now be said with the help of the latter (though often in a 
different form, not by simple replacement). It is important to recognize 
both the conventional and the factual components in the procedure of the 
zodlogists. The conventional component consists in the fact that they 
could have proceeded in a different way. Instead of the concept Piscis 
they could have chosen another concept—let us use for it the term 
‘piscis*”—which would likewise be exactly defined but which would be 
much more similar to the prescientific concept Fish by not excluding 
whales, seals, etc. What was their motive for not even considering a wider 
concept like Piscis* and instead artificially constructing the new concept 
Piscis far remote from any concept in the prescientific language? The rea- 
son was that they realized the fact that the concept Piscis promised to be 
much more fruitful than any concept more similar to Fish. A scientific 
concept is the more fruitful the more it can be brought into connection 
with other concepts on the basis of observed facts; in other words, the 
more it can be used for the formulation of laws. The zodlogists found that 
the animals to which the concept Fish applies, that is, those living in 
water, have by far not as many other properties in common as the animals 
which live in water, are cold-blooded vertebrates, and have gills through- 
out life. Hence the concept Piscis defined by these latter properties al- 
lows more general statements than any concept defined so as to be more 
similar to Fish; and this is what makes the concept Piscis more fruitful. 


t 
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In addition to fruitfulness, scientists appreciate simplicity in their con- 
cepts. The simplicity of a concept may be measured, in the first place, by 
the simplicity of the form of its definition and, second, by the simplicity 
of the forms of the laws connecting it with other concepts. This property, 
however, is only of secondary importance. Many complicated concepts 
are introduced by scientists and turn out to be very useful. In general, 
simplicity comes into consideration only in a case where there is a question 
of choice among several concepts which achieve about the same and seem 
to be equally fruitful; if these concepts show a marked difference in the 
degree of simplicity, the scientist will, as a rule, prefer the simplest of 
them. 

According to these considerations, the task of explication may. be char- 
acterized as follows. If a concept is given as explicandum, the task con- 
sists in finding another concept as its explicatum which fulfils the follow- 
ing requirements to a sufficient degree. 

1. The explicatum is to be similar to the explicandum in such a way 
that, in most cases in which the explicandum has so far been used, the 
explicatum can be used; however, close similarity is not required, and 
considerable differences are permitted. 

2. The characterization of the explicatum, that is, the rules of its use 
(for instance, in the form of a definition), is to be given in an exact form, 
so as to introduce the explicatum into a well-connected system of scien- 
tific concepts. 

3. The explicatum is to be a fruitful concept, that is, useful for the 
formulation of many universal statements (empirical laws in the case of a 
nonlogical concept, logical theorems in the case of a logical concept). 

4. The explicatum should be as simple as possible; this means as simple 
as the more important requirements (1), (2), and (3) permit. 

Philosophers, scientists, and mathematicians make explications very fre- 
quently. But they do not often discuss explicitly the general rules which they 
follow implicitly. A good explicit formulation is given by Karl Menger in con- 
nection with his explication of the concept of dimension (“What is dimension?” 
Amer. Math. Monthly, 50 [1943], 2-7; see p. 5: § 3 “Criteria for a satisfactory 
definition” [explication, in our terminology]). He states the following require- 
ments. The explicatum “must include all entities which are always denoted and 
must exclude all entities which are never denoted” by the explicandum. The 
explication “should extend the use of the word by dealing with objects not 
known or not dealt with in ordinary language. With regard to such entities, a 
definition [explication] cannot help being arbitrary.” The explication “must 
yield many consequences,” theorems possessing “generality and simplicity” and 
connecting the explicatum with concepts of other theories. See also the discus- 


sions by C. H. Langford, referred to in § 2. : 
Terminological remarks. 1. The word ‘concept’ is used in this book as a con- 
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venient common designation for properties, relations, and functions. [Note that 
(a) it does not refer to terms, i.e., words or phrases, but to their meanings, and 
(b) it does not refer to mental occurrences of conceiving but to something ob- 
jective.] For more detailed explanations see [Semantics], p. 230; [Meaning], 
p. 21. 2. If I speak about an expression (e.g., a word, a phrase, a sentence, etc.) 

. in distinction to what is meant or designated by it, I include it in quotation 
marks. That this distinction is necessary in order to avoid confusion has become 
more and more clear in the recent development of logic and analysis of language. 
3: If I want to speak about a concept (property, relation, or function) desig- 
nated by a word, I sometimes use the device of capitalizing the word, especially 
if it is not a noun (compare [Meaning], p. 17 n.). For example, I might write 
‘the relation Warmer’; to write instead ‘the relation warmer’ would look strange 
and be contrary to English grammar; to write ‘the relation of x being warmer 
than y’ would be inconvenient because of its length; the customary way of 
writing ‘the relation ‘warmer’ ’ would not be quite correct, because ‘warmer’ is 
not a relation but a word designating a relation. Similarly, I shall sometimes 
write: ‘the property (or concept) Fish’ (instead of ‘the property of being a fish’); 
‘the property (or concept) Red? (instead of ‘the property of being red’ or ‘the 
property of redness’), and the like. 


Arne Naess defines and uses a concept which seems related to our con- 
cept Explicatum (“Interpretation and preciseness. I. Survey of basic con- 
cepts” [Oslo Universitetets Studentkontor, 1947] [mimeographed]; this 
is the first chapter of a forthcoming book). Naess defines ‘the formula- 
tion U is more precise than T (in the sense that U may with profit be 
substituted for T)’ by ‘there are interpretations of T which are not inter- 
pretations of U, but there are no interpretations of U which are not also 
interpretations of T’ (ébid., p. 38). This comparative concept enables 
Naess to deal with a series of consecutive “precisations” of a given con- 
cept. Naess announces that a later chapter (iii) of the book will be “de- 
voted to the question of how to measure degrees of ambiguity, vague- 
ness, and similar properties”. The comparative concept mentioned and 
these quantitative concepts may prove to be effective tools for a more 
penetrating analysis of explication. 


§ 4. Classificatory, Comparative, and Quantitative Concepts 


A classificatory concept (e.g., Warm) serves for classifying things into two 
kinds. A comparative concept is a relation based on a comparison, with the sense 
of ‘more (in a certain respect)’ (e.g., Warmer) or ‘more or equal’. A quantitative 
concept serves to describe something with the help of numerical values (eg., 
temperature), 


Among the kinds of concept used in science, three are of special im- 
portance. We call them classificatory, comparative, and quantitative con- 


cepts. We shall make use of this distinction in our later discussion of 
confirmation and probability. In prescientific thinking classificatory con- 
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cepts are used most frequently. In the course of the development of sci- 
ence they are replaced in scientific formulations more and more by con- 
cepts of the two other kinds, although they remain always useful for the 
formulation of observational results. Classificatory concepts are those 
which serve for the classification of things or cases into two or a few mutu- 
ally exclusive kinds. They are used, for example, when substances are 
divided into metals and nonmetals, and again the metals into iron, cop- 
per, silver, etc.; likewise, when animals and plants are divided into 
classes and further divided into orders, families, genera, and, finally, 
species; when the things surrounding us are described as warm or cold, 
big or small, hard or soft, etc., or when they are classified as houses, 
stones, tables, men, etc. In these examples the classificatory concepts are 
properties. In other cases they are relations, for example, those designated 
by the phrases ‘x is close to y’ and ‘the person x is acquainted with the 
field of science y’. (A relation may be regarded as a property of ordered 
pairs.) Quantitative concepts (also called metrical or numerical concepts 
or numerical functions) are those which serve for characterizing things or 
events or certain of their features by the ascription of numerical values; 
these values are found either directly by measurement or indirectly by 
calculation from other values of the same or other concepts. Examples of 
quantitative concepts are length, length of time, velocity, volume, mass, 
force, temperature, electric charge, price, I.Q., infantile mortality, etc. 
In many cases a quantitative concept corresponds to a classificatory con- 
cept. Thus temperature corresponds to the property Warm; and the con- 
cept of a distance of less than five miles corresponds to the relation of 
proximity. The method of quantitative concepts and hence of measure- 
ment was first used only for physical events but later more and more in 
other fields also, especially in economics and psychology. Quantitative 
concepts are no doubt the most effective instruments in the scientific 
arsenal. Sometimes scientists, especially in the fields of social science and ` 
psychology, hold the view that, in cases where no way is discovered for 
the introduction of a quantitative concept, nothing remains but to use 
concepts of the simplest kind, that is, classificatory ones. Here, however, 
they overlook the possibility and usefulness of comparative concepis, 
which, in a sense, stand between the two other kinds. Comparative con- 
cepts (sometimes called topological or order concepts) serve for the for- 
mulation of the result of a comparison in the form of a more-less-statement 
without the use of numerical values. Before the scientific, quantitative 
concept of temperature was introduced, everyday language contained 
comparative concepts. Instead of merely classifying things into a few kinds 
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with the help of terms like ‘hot’, ‘warm’, ‘luke-warm’, ‘cold’, a more effec- 
tive characterization was possible by saying that æ is warmer than y (or 
colder, or equally warm, as the case may be). 

A comparative concept is always a relation. If the underlying classifica- 
tory concept is a property (e.g., Warm), the comparative concept is a 
dyadic relation, that is, one with two arguments (e.g., Warmer). If the 
classificatory concept is a dyadic relation (e.g., the relation of x being ac- 
quainted with (the field) y), the comparative concept has, in general, 
four arguments (e.g., the relation of x being better acquainted with y than 
u with v). It is sometimes useful to regard the tetradic relation as a dyadic 
relation between two pairs. (We might say, for example: ‘the relation of 
being acquainted holds for the pair x, y to a higher degree than for the 
pair u, v.) Sometimes the introduction of a triadic relation is preferred 
to that of a tetradic relation. If we do not know how to compare the de- 
gree of Peter’s knowledge in physics with Jack’s knowledge in history, we 
might perhaps be content to use either or both of the two triadic relations 
expressed by the following phrases: ‘x is better acquainted with (the field) 
y than with v’, ‘x is better acquainted with y than w’. The first of these 
two relations requires that we are able to compare the degree of Peter’s 
knowledge in physics with that in history, which might seem problemati- 
cal. The second relation involves the comparison of Peter’s knowledge in 
physics with that of Jack; here it seems easier to invent suitable tests. 

Each of the comparative concepts given above as an example has the 
meaning of ‘more’ or ‘to a higher degree’ with respect to a given classifica- 
tory concept. To any of those classificatory concepts (e.g., Warm), we 
can likewise construct a comparative concept meaning ‘less’ or ‘to a lower 
degree’ (e.g., Less-warm; in other words, Colder); this is the converse of 
the first comparative concept. In either case the comparative concept, 
regarded as a dyadic relation (of simple entities, pairs, etc.), has obviously 
the following relational properties: it is irreflexive, transitive, and (hence) 
asymmetric. (For definitions of these and other terms of the theory of re- 
lations see D25-2.) 

In addition to the form of comparative concepts just mentioned, there 
is another form, less customary but often more useful. A concept of this 
second kind does not mean ‘more’ but ‘more or equal’ with respect to the 
underlying classificatory concept, in other words, ‘to at least the same 
degree’, that is, ‘to the same or a higher degree’ (e.g., the relation of x be- 
ing at least as warm as y). Or it may mean ‘less or equal’ (e.g., the rela- 
tion of x being less warm than y or equally as warm as y; in other words, 
of x being at most as warm as y). It is easily seen that a comparative con- 
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cept of this second kind, regarded as a dyadic relation, is reflexive and 
transitive but neither symmetric nor asymmetric. A comparative relation 
is sometimes of such a kind that, for any x and y, it holds either between 
«and y or between y and (or both). In this case the relation (for example, 
Warmer-Or-Equally-Warm) orders itsmembers in a kind of linear order. If, 
however, the condition is not fulfilled, then there are incomparable cases. 
Thus it might perhaps be that we find it possible to compare the scientific 
achievements of two persons if both work in the same field, while we do not 
know a way of comparing a physicist with a historian. 

In everyday language the first form of comparative concept is much 
more customary than the second. There are many single words for those 
of the first form, for instance, ‘above’, ‘beyond’, ‘after’, etc., and especially 
the comparatives, for instance, ‘more’, ‘warmer’, etc., while there are 
hardly any single words for those of the second form. On the other hand, 
there is a general trend in the development of the language of science 
toward concepts which are wider than corresponding concepts of pre- 
scientific language by including extreme cases, especially cases of zero 
value or of identity or equality; for example, the term ‘number’ is now 
taken as including o, ‘class’ as including the null class, ‘velocity’ as includ- 
ing the case of rest regarded as velocity o, etc. With respect to compara- 
tive concepts, this trend means a development from those of the first kind 
to those of the second, because the latter include the boundary case of 
equality. One advantage of those of the second kind consists in the fact 
that on the basis of ‘more or equal’ we can define both ‘equal’ and ‘more’ 
(‘x = y’ can be defined by ‘x = yand y = x’; ‘x > y by ‘x = yand not 
Y 2 x’), while on the basis of ‘more’ we cannot define either ‘equal’ or ‘more 
or equal’, For these reasons, when we come to a discussion of a comparative 
concept of confirmation (§ 8), we shall take one of the second form, as ex- 
pressed by: ‘h is confirmed by e to the same or a higher degree than 
k by e”. 

For an analysis of comparative and quantitative concepts and an explanation 
of the steps to be taken in the construction of concepts of these kinds see Car- 
nap, Physikalische Begriffsbildung (Karlsruhe, 1926). C. G. Hempel and P. Op- 
penheim have developed and improved the characterizations of the two kinds 
of concept and illustrated their roles in various fields of science in their book 
Der Typusbegriff im Lichte der neuen Logik: Wissenschaftstheoretische Unter- 
suchungen sur Konstitutionsforschung und Psychologie (Leiden, 1936). 


§ 5. Comparative and Quantitative Concepts as Explicata 


The role of comparative and quantitative concepts as explicata is discussed 
in preparation for a later discussion of comparative and quantitative concepts 


of confirmation. 
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Classificatory concepts are the simplest and least effective kind of con- 
cept. Comparative concepts are more powerful, and quantitative concepts 
still more; that is to say, they enable us to give a more precise description 
of a concrete situation and, more important, to formulate more compre- 
hensive general laws. Therefore, the historical development of the lan- 
guage is often as follows: a certain feature of events observed in nature is 
first described with the help of a classificatory concept; later a compara- 
tive concept is used instead of or in addition to the classificatory concept; 
and, still later, a quantitative concept is introduced. (These three stages 
of development do, of course, not always occur in this temporal order.) 

The situation may be illustrated with the help of the example of those 
concepts which have led to the quantitative concept of temperature. The 
state of bodies with respect to heat can be described in the simplest and 
crudest way with the help of classificatory concepts like Hot, Warm, and 
Cold (and perhaps a few more). We may imagine an early, not recorded 
stage of the development of our language where only these classificatory 
terms were available. Later, an essential refinement of language took place 
by the introduction of a comparative term like ‘warmer’. In the case of 
this example, as in many others, this second step was already made in the 
prescientific language. Finally, the corresponding quantitative concept, 
that of temperature, was introduced in the construction of the scientific 
language. 

The concept Temperature may be regarded as an explicatum for the 
comparative concept Warmer. The first of the requirements for explicata 
discussed in § 3, that of similarity or correspondence to the explicandum, 
means in the present case the following: The concept Temperature is to 
be such that, in most cases, if x is warmer than y (in the prescientific 
sense, based on the heat sensations of the skin), then the temperature of x 
is higher than that of y. Here a few remarks may be made. 

(i) The requirement refers to most cases, not to all cases. It is easily 
seen that the requirement is fulfilled only in this restricted sense. Suppose 
I enter a moderately heated room twice, first coming from an overheated 
room and at a later time coming from the cold outside. Then it may hap- 
pen that I declare the room, on the basis of my sensations, to be warmer 
the second time than the first, while the thermometer shows at the second 
time the same temperature as at the first (or even a slightly higher one). 
Experiences of this kind do not at all lead us to the conclusion that the 
concept Temperature defined with reference to the thermometer is inade- 
quate as an explicatum for the concept Warmer. On the contrary, we have 
become accustomed to let the scientific concept overrule the prescientific 
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one in all cases of disagreement. In other words, the term ‘warmer’ has 
undergone a change of meaning. Its meaning was originally based directly 
on a comparison of heat sensations, but, after the acceptance of the scien- 
tific concept Temperature into our everyday language, the word ‘warmer’ 
is used in the sense of ‘having a higher temperature’. Thus the experience 
described above is now formulated as follows: “I believed that the room 
was at the second time warmer than at the first, but this was an error; the 
room was actually not warmer; I found this out with the help of the 
thermometer”. For this second, scientific meaning of ‘warmer’ we shall 
use in the following discussion the term ‘warmer”. 

(ii) The converse of the requirement mentioned above would be this: 
the concept Temperature is to be such that, if æ is not warmer than y (in 
the prescientific sense), then the temperature of x is not higher than that 
of y. It is important to realize that this is not required, not even “in most 
cases”. When the difference between the temperatures of x and y is small, 
then, as a rule, we notice no difference in our heat sensations. This again 
is not taken as a reason for rejecting the concept Temperature. On the 
contrary, here again we have become accustomed to the new, scientific 
concept Warmer*, and thus we say: “x is actually warmer* than y, al- 
though we cannot feel the difference”. 

(iii) Thus, we have two scientific concepts corresponding to the pre- 
scientific concept Warmer. The one is the comparative concept Warmer*, 
the other the quantitative concept Temperature. Either of them may be 
regarded as an explicatum of Warmer. Both are defined with reference to 
the thermometer. Since the thermometer has a higher discriminating pow- 
er than our heat sensations, both scientific concepts are superior to the 
prescientific one in allowing more precise descriptions. The procedure 
leading from the explicandum to either of the two explicata is as follows. 
At first the prescientific concept is guiding us in our choice of an explica- 
tum (with possible exceptions, as discussed earlier). Once an explicatum 
is defined in a relatively simple way, we follow its guidance in cases where 
the prescientific concept is not sufficiently discriminative. It would be 
possible but highly inadvisable to define a concept Temperature in such a 
way that x and y are said to have the same temperature whenever our 
sensations do not show a difference. This concept would be in closer agree- 
ment with the explicandum than the concept Temperature actually used. 
But the latter has the advantage of much greater simplicity both in its 
definition—in other words its method of measurement—and in the laws 
formulated with its help. 

(iv) Of the two scientific terms ‘warmer*’ and ‘temperature’, the latter 
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is the one important for science; the former serves merely as a convenient 
abbreviation for ‘having a higher temperature’. The quantitative concept 
Temperature has proved its great fruitfulness by the fact that it occurs in 
many important laws. This is not always the case with quantitative con- 
cepts in science, even if they are well defined by exact rules of measure- 
ment. For instance, it has sometimes occurred in psychology that a quan- 
titative concept was defined by an exact description of tests but that the 
expectation of finding laws connecting the values thus measured with 
values of other concepts was not fulfilled; then the concept was finally 
discarded as not fruitful. If it is a question of an explication of a pre- 
scientific concept, then a situation of the kind described, where we do not 
succeed in finding an adequate quantitative explicatum, ought not to dis- 
courage us altogether from trying an explication. It may be possible to 
find an adequate comparative explicatum. Let us show this by a fictitious 
example. The experience leading to the concept Temperature was first a 
comparative one; it was found that, if x is warmer than y (in the pre- 
scientific sense) and we bring a body of mercury first in contact with x and 
later with y, then it has at the first occasion a greater volume than at the 
second. By a certain device it was made possible to measure the small 
differences in the volume of the mercury; and that was taken as basis 
for the quantitative concept Temperature. Now let us assume fictitiously 
that we did not find technical means for measuring the differences in the 
volume of the mercury, although we were able to observe whether the 
mercury expands or contracts. In this case we should have no basis for a 
quantitative concept Temperature, but it would still be possible to define 
the comparative concept Warmer* with reference to an expansion of the 
mercury. This scientific concept Warmer* could then be taken as explica- 
tum for the prescientific concept Warmer. Here, in the fictitious case, the 
concept Warmer™ would be of greater importance than it is in actual phys- 
ics, because it would be the only explicatum. Note that Warmer* here is 
essentially the same concept as Warmer* in the earlier discussion but that 
there is a difference in the form of the two definitions. In the former case 
we defined Warmer* in terms of higher temperature, hence with the help 
of a quantitative concept; here, in the fictitious case, it is defined with ref- 
erence to the comparative concept of the expansion of mercury without 
the use of quantitative concepts. The distinction between these two ways 
of defining a comparative concept, the quantitative way and the purely 
comparative, that is, nonquantitative, way, will be of importance later 
when we discuss the comparative concept of confirmation. 

To make a weaker fictitious assumption, suppose that the volume dif- 
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ferences could be measured and hence the quantitative concept Tempera- 
ture could be defined but that—this is the fictitious feature—no important 
laws containing this concept had been found. In this case the concept 
would be discarded as not fruitful. And hence in this case likewise the 
comparative concept Warmer* would be taken as the only explicatum 
for Warmer. 

Later, when we discuss the problem of explication for the concept of 
confirmation, we shall distinguish three concepts; the classificatory, the 
comparative, and the quantitative concept of confirmation. They are anal- 
ogous to the concepts Warm, Warmer, and Temperature; thus the results 
of the present discussion will then be utilized. 


§ 6. Formalization and Interpretation 


The axiomatic method consists of two phases, formalization and interpreta- 
tion. The formalization of a theory consists in the construction of an axiom 
system. This is a semiformal system; the axiomatic terms are left uninterpreted, 
while some logical terms are taken with their customary meanings. The inter- 
pretation of an axiom system is given by rules which determine the meanings of 
the axiomatic terms. As an illustration for the distinction between the two 
phases, the difference between Peano’s axiom system of arithmetic and the 
Frege-Russell system of arithmetic, which gives an interpretation, is explained. 


The introduction of new concepts into the language of science—whether 
as explicata for prescientific concepts or independently—is sometimes 
done in two separate steps, formalization and interpretation. The pro- 
cedure of separating these steps has steadily grown in importance during 
the last half-century. The two steps are the two phases of what is known 
as the axiomatic (or postulational) method in its modern form (as dis- 
tinguished from its traditional form dating from Euclid). Frequently, the 
first step alone is already very useful, and sometimes considerable time 
passes until it is followed by the second step. j 

The formalization (or axiomatization) of a theory or of the concepts of 
a theory is here understood in the sense of the construction of a formal 
system, an axiom system (or postulate system) for that theory. 


We are not speaking here of a formal system in the strict sense, sometimes 
called a calculus (in the strict sense) or a syntactical system; in a system of this 
kind all rules are purely syntactical and all signs occurring are left entirely un- 
interpreted (see [Semantics] § 24). On the other hand, we are not speaking of 
axiom systems of the traditional kind, which are entirely interpreted. In the 
discussions of this book we are rather thinking of those semiformal, semi-inter- 
preted systems which are constructed by contemporary authors, especially 
mathematicians, under the title of axiom systems (or postulate systems). In a 
system of this kind the axiomatic terms (for instance, in Hilbert’s axiom sys- 
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tem of geometry the terms ‘point’, ‘line’, ‘incidence’, ‘between’, and others) 
remain uninterpreted, while for all or some of the logical terms occurring (e.g., 
‘not’, ‘or’, ‘every’) and sometimes for certain arithmetical terms (e.g., ‘one’, 
‘two’) their customary interpretation is—in most cases tacitly—presupposed. 
(For an explanation of the semiformal character of axiom systems see [Founda- 
tions] § 16.) 

The inter pretation of an axiom system consists in the interpretation of 
its primitive axiomatic terms. This interpretation is given by rules specify- 
ing the meanings which we intend to give to these terms; hence the rules 
are of a semantical nature. (They are sometimes called correlative defini- 
tions (Reichenbach’s “Zuordnungsdefinitionen”) or epistemic correla- 
tions (Northrop).) Sometimes the interpretation of a term can be given 
in the simple form of an explicit definition; this definition may be regarded 
as a semantical rule which states that the term in question is to have the 
same meaning as a certain compound expression consisting of terms whose 
meanings are presupposed as known. 

_ For our later discussions on probability it will be of great importance to 
recognize clearly the character of the axiomatic method and especially the 
distinction between formalization and interpretation. Some authors be- 
lieve they have given a solution of the problem of probability, in our termi- 
nology, an explication for probability, by merely constructing an axiom 
system for probability without giving an interpretation; for a genuine 
explication, however, an interpretation is essential. We shall now illus- 
trate the axiomatic method and the distinction between its two phases by 
taking as an example the arithmetic of natural numbers. The prescientific 
terms of this field are the numerals ‘one’, ‘two’, etc. (or the corresponding 
figures) and terms for arithmetical operations like ‘plus’ (previously 
‘and’), ‘times’, etc., as they are used in everyday language for counting 
things and for calculating with numbers applied to things. Preliminary 
steps toward a systematization of the theory and an explication of the 
» terms have been made for several thousand years in the form of rules of 
calculation. The first axiom system for arithmetic which satisfies modern 
Tequirements as to the exactness of formulation is the famous axiom sys- 
tem of G. Peano. This system takes as primitive axiomatic terms sOn 
‘number’, and ‘successor’, It consists of five axioms, among them: ʻo is a 
number’ and ‘the successor of a number is a number’. On the basis of the 
primitive terms mentioned, terms for the ordinary arithmetical operations 
can be introduced by recursive definitions. On the basis of the axioms and 
the recursive definitions, the ordinary theorems of elementary arithmetic 
can be proved. In this procedure the primitive terms mentioned and the 
terms introduced on their basis remain uninterpreted, It is only for di- 
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dactic, psychological reasons that not arbitrarily chosen symbols are 
taken as primitive terms but customary signs or words. Their well-known 
meanings facilitate the manipulations of the signs in the deductions, but 
these deductions are formal in the sense that they do not make use of the 
meanings of the axiomatic terms at any point. 

Peano’s axiom system, by furnishing the customary formulas of arith- 
metic, achieves in this field all that is to be required from the point of view 
of formal mathematics. However, it does not yet achieve an explication of 
the arithmetical terms ‘one’, ‘two’, ‘plus’, etc. In order to do this, an in- 
terpretation must be given for the semiformal axiom system. There is an 
infinite number of true interpretations for this system, that is, of sets of 
entities fulfilling the axioms, or, as one usually says, of models for the 
system. One of them is the set of natural numbers as we use them in 
everyday life. But it can be shown that all sets of any entities exhibiting - 
the same structure as the set of natural numbers in their order of magni- 
tude—in Russell’s terminology, all progressions—are likewise models of 
Peano’s system. From the point of view of the formal system, no distinc- 
tion is made between these infinitely many models. However, in order to 
state the one interpretation we are aiming at, we have to give an explica- 
tion for the terms ‘one’, ‘two’, etc., as they are meant when we apply them 
in everyday life. 

The first exact explications for the ordinary arithmetical terms have 
been given by G. Frege and later in a similar way by Bertrand Russell. 
Both Frege and Russell give explicata for the arithmetical concepts by 
explicit definitions on the basis of a purely logical system whose primitive 
terms are presupposed as interpreted. On the basis of this interpretation 
of the arithmetical terms, Peano’s axioms become provable theorems in 
logic. It is a historically and psychologically surprising fact that this ex- 
plication was such a difficult task and was achieved so late, although the 
explicanda, the elementary concepts of arithmetic, are understood and 
correctly applied by every child and have been successfully applied and 
to some extent also systematized for thousands of years. 

It is important to see clearly the difference between Peano’s and Frege’s 
systems of arithmetic. Peano’s system, as mentioned, does not go beyond 
the boundaries of formal mathematics. Only Frege’s system enables us to 
apply the arithmetical concepts in the description of facts; it enables us to 
transform a sentence like ‘the number of fingers on my right hand is 5’ 
into a form which does not contain any arithmetical terms. Peano’s sys- 
tem contains likewise the term ‘5’, but only as an uninterpreted symbol. 
It enables us to derive formulas like ‘3 + 2 = 5’, but it does not tell us 
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how to understand the term ‘5’ when it occurs in a factual sentence like 
that about the fingers. Only Frege’s system enables us to understand sen- 
tences of this kind, that is to say, to know what we have to do in order 
to find out whether the sentence is true or not. 

The result of this discussion is, in general terms, the following. As soon 
as we go over from the field of formal mathematics to that of knowledge 
about the facts of nature, in other words, to empirical science, which in- 
cludes applied mathematics, we need more than a mere calculus or axiom 
system; an interpretation must be added to the system. 


Concerning the arithmetical systems of Peano, Frege, and Russell: G. Peano, 
Arithmetices principia (1889); G. Frege, Grundlagen der Arithmetik (1884); 
Grundgesetze der Arithmetik (2 vols.; 1893, 1903); Bertrand Russell, The prin- 
ciples of mathematics (1903); with A. N. Whitehead, (Princ. Math.]. For a dìs- 
cussion of the distinction between Peano’s arithmetic and that of Frege and 
Russell see Russell, Introduction to mathematical philosophy (1918), chaps. 1 
and 2; and Carnap [Foundations] §§ 17 ff. 


CHAPTER II 
THE TWO CONCEPTS OF PROBABILITY 


The various theories of probability are attempts at an explication of what is 
regarded as the prescientific concept of probability. In fact, however, there are 
two fundamentally different concepts for which the term ‘probability’ is in gen- 
eral use. The two concepts are as follows, here distinguished by subscripts. 

(i) Probability, is the degree of confirmation of a hypothesis / with respect 
to an evidence statement e, e.g., an observational report. This is a logical, se- 
mantical concept. A sentence about this concept is based, not on observation of 
facts, but on logical analysis; if it is true, it is L-true (analytic). 

(ii) Probability, is the relative frequency (in the long tun) of one property of 
events or things with respect to another. A sentence about this concept is factu- 
al, empirical. 

Both concepts are important for science. Many authors who take one of the 
two concepts as explicandum are not aware of the importance or even of the 
existence of the other concept. This has led to futile controversy. 

_ Probability, is obviously an objective concept. It is important to recognize 
that probability; is likewise objective. It seems to me that most of those au- 
thors from classical to present times who do not accept a frequency interpreta- 
tion of probability mean something like probability, as their explicandum and 
that their systems themselves are objectivistic. The latter fact is often veiled 
by the use of misleading subjectivistic formulations, mostly in preliminary 
explanations, e.g., in terms of degree of actual or reasonable belief. This psy- 
chologism in inductive logic, i.e., in the theory of probability, is quite analogous 
to the well-known psychologism in deductive logic, which is more and more 
eliminated in modern logic. 


§ 8, The Semantical Concepts of Confirmation 


The concepts of confirmation to be dealt with in this book are semantical, 
i.e., based upon meaning, and logical, i.e., independent of facts. They belong, 
not to deductive, but to inductive logic. We distinguish three semantical con- 
cepts of confirmation: (i) the classificatory concept of confirmation (‘the 
hypothesis / is confirmed by the evidence e’, in symbols ‘€(4,¢)’); (ii) the com- 
parative concept of confirmation (‘his confirmed by e at least as highly as h’ by 
e”, ‘ME(h,e,h',e’)’); (iii) the quantitative concept of confirmation, the concept 
of degree of confirmation (‘h is confirmed by e to the degree g’, ‘c(h,e) = q’). 


One of the chief tasks of this book will be the explication of certain con- 
cepts which are connected with the scientific procedure of confirming or 
disconfirming hypotheses with the help of observations and which we 
therefore will briefly call concepts of confirmation. We leave for later chap- 
ters the task of laying down definitions of explicata; at present we are 
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concerned only with an explanation of the explicanda—in other words, 
with the formulation of our problem, not yet with its solution, 

The procedure cf confirmation is a complex procedure consisting of 
components of different kinds. In this book we are concerned only with 
what may be called the logical aspect of confirmation, namely, with cer- 
tain logical relations between sentences (or propositions expressed by these 
sentences). Within the practice of the procedure of confirmation, these 
relations are of interest to the scientist, for instance, in the following situa- 
tion. He intends to examine a certain hypothesis 4; he makes many ob- 
servations of particular events which he regards as relevant for judging 
the hypothesis 4; he formulates the results of all observations made or as 
much of them as are relevant in a report e, which is a long sentence. Then 
he tries to determine whether and to what degree the hypothesis h is con- 
firmed by the observational evidence e. This last question alone is what 
we shall be concerned with. We call it a logical question because, once a 
hypothesis is formulated by h and any possible evidence by e (it need not 
be the evidence actually observed), the problem whether and how much % 
is confirmed by e is to be answered merely by a logical analysis of h and e 
and their relations. This question is not a question of facts in the sense that 
factual knowledge is required to find the answer. The sentences hi and e, 
which are studied, do themselves certainly refer to facts. But, once h and e 
are given, the question mentioned requires only that we be able to under- 
stand them, that is, to grasp their meanings, and to establish certain rela- 
tions which are based upon their meanings. Since we take semantics as 
the theory of the meanings of expressions in language and especially of 
sentences (this will be explained later), the relations betwen h and e to 
be studied may be characterized as semantical; therefore we call them 
semantical concepts of confirmation. 

The question of confirmation in which we are here interested has been 
characterized above as a logical question. In order to avoid misunder- 
standings, a qualification should here be made. The question mentioned 
does not belong to deductive logic but to inductive logic. The similarities 
and differences between these two branches of logic will later be discussed 
in detail (§ 43B). Both branches have in common that the solution of 
their problems does not require factual knowledge but only analysis of 
meaning; therefore both parts of logic belong to semantics, This similarity 
makes it possible to explain the logical character of the relations of confir- 
mation by an analogy with a more familiar relation in deductive logic, 
viz., the relation of h being a logical consequence of e, in our terminology, 
the relation of L-implication (i.e., logical implication or entailment, in 
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distinction to material implication) between e and h. Let e be the sen- 
tence ‘all men are mortal, and Socrates is a man’, and / the sentence 
‘Socrates is mortal’. Both e and 4 have factual content. But, in order to 
answer the question whether e L-implies 4, we need no factual knowledge, 
we need not know whether e is true or false, whether % is true or false, 
whether anybody believes in e and, if so, on what basis. All that isrequired 
is a logical analysis of the meanings of the two sentences. Analogously, to 
answer the question how much a hypothesis + is confirmed by an observa- 
tional report e—a question in logic, but here in inductive, not in deduc- 
tive, logic—we need not know whether e is true or false, whether h is 
true or false, whether anybody believes in e and, if so, on the basis of ob- 
servations or just by imagination or in whatever way else. All we need is 
a logical analysis of the meanings of the two sentences. That is the reason 
why we call our problem the logical or semantical problem of confirmation, 
in distinction to what might be called methodological problems of con- 
firmation (§ 44A), e.g., how best to construct 4n apparatus and to ar- 
range it for certain experiments, how to carry out the experiments, how to 
observe the results, etc., all this for the purpose of an experimental ex- 
amination of a given hypothesis. 

In this book we shall deal with three semantical concepts of confirmation. 
Although in the application outlined above, the evidence is usually an ob- 
servational report and the hypothesis a law or a prediction, we shall not 
restrict our concepts of confirmation to any particular contents or forms 
of the two sentences. The three semantical concepts of confirmation be- 
long to the three levels of concepts earlier explained (§ 4). 

(i) The classificatory concept of confirmation is that relation between 
two sentences % and e which is usually expressed by sentences of the fol- 
lowing forms: 


‘h is confirmed by e.’ 

‘h is supported by e.” 

ʻe gives some (positive) evidence for k.’ 

ʻe is evidence substantiating (or corroborating) the assumption of h.’ 


Here e is ordinarily, as in the previous example, an observational re- 
port, but it may also refer to particular states of affairs not yet known 
but merely assumed and may even include assumed laws; + is usually a 
statement about an unknown state of affairs, e.g., a prediction, or it may 
be a law or any other hypothesis. It is clear that this concept of confirma- 
tion is a relation between two sentences, not a property of one of them. 
Thus it is analogous to the examples with ‘close’ or ‘acquainted’ for classi- 
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ficatory concepts in § 4 rather than to those of properties. Customary for- 
mulations which mention only the hypothesis are obviously elliptical; the 
evidence is tacitly understood. For instance, when a physicist says, ‘This 
hypothesis is well confirmed,’ he means ‘. . . on the evidence of the ob- 
servational results known today to physicists.’ (On the disadvantages of 
these elliptical formulations see below, §§ roA and 42A.) In the discus- 
sion of explicata for the classificatory concept of confirmation we shall use 
the symbol ‘© (§ 86); thus “€(h,e)’ will correspond to the formulations 
mentioned above. 

(ii) The comparative concept of confirmation is usually expressed in 
sentences of the following or similar forms: 

a. ‘his more strongly confirmed (or supported, substantiated, corrobo- 

rated, etc.) by e than h’ by e”. 

Here we have a tetradic relation between four sentences. It may also be 
regarded as a dyadic relation between two pairs of sentences, h,e and 
h'e’. In general, the two hypotheses / and h’ are different from one an- 
other, and likewise the two bodies of evidence e and e’. Some scientists will 
perhaps doubt whether a comparison of this most general form is possible 
and may, perhaps, restrict the application of the comparative concept to 
those situations where two bodies of evidence are compared with respect 
to the same hypothesis (example (b)), or where two hypotheses are ex- 
amined with respect to one evidence (example (c)). In either case the com- 
parative concept is a triadic relation between three sentences. 

b. “The general theory of relativity is more strongly confirmed by the 
results of laboratory experiments and astronomical observations 
known today than by those known in 1905. 

c. ‘The optical phenomena available to physicists in the nineteenth 
century were more adequately explained by the wave theory of light 
than by the corpuscular theory; in other words, they gave stronger 
Support to the former theory than to the latter.’ 

The forms (a), (b), and (c) use that kind of comparative concept which 
means ‘more (in a certain respect)’. We have seen earlier (§ 4) that there 
is a second kind which means ‘more or equal’ and which, although less 
customary, is sometimes more useful. This is the case also with the con- 
cepts of confirmation. Therefore, when we later approach the problem of 
explication for a comparative concept of confirmation (chap. vii), we shall 
take as explicandum the relation usually expressed in sentences of the fol- 
lowing or a similar form: 

d. ‘his confirmed by e at least as strongly (i.e., either more strongly or 

equally) as h’ by e”. 
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We shall then use ‘MG’ as a symbol for an explicatum to be discussed. 
Thus ‘I€(h,e,h’,e’)’ corresponds to the customary formulation (d). 

(iii) The quantitative (or metrical) concept of confirmation, the con- 
cept of degree of confirmation. Opinion seems divided as to whether or 
not a concept of this kind ever occurs in the customary talk of scientists, 
that is to say, whether they ever assign a numerical value to the degree 
to which a hypothesis is supported by given observational material or 
whether they use only classificatory and comparative concepts of con- 
firmation. For the present discussion we leave this question open; even 
if the latter were the case, an attempt to find a quantitative explicatum 
for the comparative explicandum would be worth while. (This would be 
analogous to the example discussed earlier (§ 5): for the comparative ex- 
plicandum Warmer, not only a comparative explicatum Warmer* but also 
an adequate quantitative explicatum Temperature has been found.) 
Again opinion today is divided as to whether there is a good prospect for 
finding a satisfactory quantitative concept of confirmation. We shall dis- 
cuss this problem in detail. In our general discussions of possible solutions 
the symbol ‘c’ will be used for the degree of confirmation (following Hosi- 
asson). Thus ‘c(h,e) = q’ will be written for ‘the degree of confirmation of 
h with respect to eis q’; here, 4 and e are sentences and q is a real number 
of the interval o-1. In Volume II we shall define a specific concept of this 
kind and propose it as explicatum; for it the symbol ‘c*’ will be used. On 
the basis of this concept a system of inductive logic will be constructed. 


§ 9. The Two Concepts of Probability 


The various theories of probability offer many different explicata. They are 
sometimes classified in three groups: (i) the classical conception, (ii) the logical 
conception, (iii) the frequency conception. However, it is found that the various 
theories are not answers to the same problem, i.e., explications of the same ex- 
plicandum. There are two principal explicanda, two fundamentally different 
meanings of the word ‘probability’ in presystematic use: (i) probability, = de- 
gree of confirmation and (ii) probability, = relative frequency. The contro- 
versy between the frequency conception and the theories on the other side is 
seen as futile, caused chiefly by the fact that most authors on either side do not 
realize that those on the other side start from a different explicandum whose 
explication is likewise of great importance for science. A few explanations on 
probability, are given. 


The history of the theory of probability is the history of attempts to 
find an explication for the prescientific concept of probability. The num- 
ber of solutions which have been proposed for this problem in the course 
of its historical development is rather large. The differences, though some- 
times slight, are in many cases considerable. To bring some order into the 
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bewildering multiplicity, several attempts have been made to arrange the 
many solutions into a few groups. The following is a simple and plausible 
classification of the various conceptions of probability into three groups 
(proposed by Nagel [Principles]): (i) the classical conception, originated 
by Jacob Bernoulli, systematically developed by Laplace, and repre- 
sented by their followers in various forms; here, probability is defined as 
the ratio of the number of favorable cases to the number of all possible 
cases; (ii) the conception of probability as a certain objective logical rela- 
tion between propositions (or sentences); the chief representatives of this 
concept are John M. Keynes and Harold Jeffreys; (iii) the conception of 
probability as relative frequency, developed most completely in the theo- 
ries of Richard von Mises, Hans Reichenbach, and those of modern 
mathematical statistics. 

At the present we shall not enter a discussion of these various concep- 
tions. While the main point of interest both for the authors and for the 
readers of the various theories of probability is normally the solutions 
proposed in those theories, we shall inspect the theories from a different 
point of view. We shall not ask what solutions the authors offer but rather 
which problems the solutions are intended to solve; in other words, we 
shall not ask what explicata are proposed but rather which concepts are 
taken as explicanda. 

This question may appear superfluous, and the fact obvious, that the 
explicandum for every theory of probability is the prescientific concept of 
probability, i.e., the meaning in which the word ‘probability’ is used in 
the prescientific language. Is the assumption correct, however, that there 
is only one meaning connected with the word ‘probability’ in its customary 
use or, at the least, that only one meaning has been chosen by the authors 
as their explicandum? When we look at the formulations which the au- 
thors themselves offer in order to make clear which meanings of ‘proba- 
bility’ they intend to take as their explicanda, we find phrases as different 
as ‘degree of belief’, ‘credibility’, ‘degree of reasonable expectation’, ‘de- 
gree of possibility’, ‘degree of proximity to certainty’, ‘degree of partial 
truth’, ‘relative frequency’, and many others. This multiplicity of phrases 
shows that any assumption of a unique explicandum common to all au- 
thors is untenable. We might even be tempted to go to the opposite ex- 
treme and to conclude that the authors are dealing not with one but with 
a dozen or more different concepts. However, I believe that this multiplic- 
ity is misleading. It seems to me that the number of explicanda in all the 
various theories of probability is neither just one nor about a dozen, but in 
all essential respects—leaving aside slight variations—very few and chiefly 
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two. In the following discussions we shall use subscripts in order to dis- 
tinguish these two principal meanings of the term ‘probability’ from which 
most of the various theories of probability start; we are, of course, dis- 
tinguishing between two explicanda and not between the various explicata 
offered by these theories, whose number is much greater. The two con- 
cepts are (i) probability; = degree of confirmation; (ii) probability, = rel- 
ative frequency in the long run. Strictly speaking, there are two groups of 
concepts, since, both for (i) and for (ii), there is a classificatory, a compara- 
tive, anda quantitative concept; however, at the present moment, we may 
leave aside these distinctions. 

Let me emphasize again that the distinction made here refers to two 
explicanda, not to two explicata. That there is more than one explicatum 
is obvious; and, indeed, their number is much larger than two. But most 
investigators in the field of probability apparently believe that all the 
various theories of probability are intended to solve the same problem 
and hence that any two theories which differ fundamentally from each 
other are incompatible. Consequently, we find that most representatives 
of the frequency conception of probability reject all other theories and, 
vice versa, that the frequency conception is rejected by most of the au- 
thors of other theories. This whole controversy seems futile and un- 
necessary. 

A few examples may show how much of the futile controversy between 
representatives of different conceptions of probability is due to the lack 
of awareness, on both sides, of the existence and importance of the prob- 
ability concept of the other side. We take as examples a prominent con- 
temporary representative of each conception: Mises, who constructed the 
first complete theory based on the frequency conception, and Jeffreys, 
who constructed the most advanced theory based on probability,. Mises 
seems to believe that probability. is the only basis of the Calculus of 
Probability ([Probab.], first lecture). To speak of the probability of the 
death of a certain individual seems to him meaningless. Any use of the 
term ‘probability’ in everyday life other than in the statistical sense of 
probability, has in his view nothing to do with the Calculus of Probability 
and cannot take numerical values. That he regards Keynes’s conception 
of probability as thoroughly subjectivistic indicates clearly his misunder- 
standing (see below, § 12A). 

On the other hand, Jeffreys lays down certain requirements which every 
theory of probability (and that means for him probability,) should fulfil 
and then rejects all frequency theories, that is, theories of probability, 
because they do not fulfil his requirements. Thus he says: “No ‘objective’ 
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definition of probability in terms of actual or possible observations . . . 
is admissible” ([Probab.], p. 11), because the results of observations are 
initially unknown, and, consequently, we could not know the fundamental 
principles of the theory and would have no starting point. He even goes 
se far as to say that, “in practice, no statistician ever uses a frequency 
definition, but that all use the notion of degree of reasonable belief, usual- 
ly without ever noticing that they are using it” (p. 300). While Mises’ 
concern with explicating the empirical concept of probability, by the limit 
of relative frequency in an infinite sequence has led him to apply the term 
‘probability’ only in cases where such a limit exists, Jeffreys misunder- 
stands his procedure completely and accuses the empiricist Mises of apri- 
orism: “The existence of the limit is taken as a postulate by Mises. . . . 


The postulate is an a priori statement about possible experiments and is 


in itself objectionable” (p. 304). Thus we find this situation: Mises and 
Jeffreys both assert that there is only one concept of probability that is of 
scientific importance and that can be taken as the basis of the Calculus of 
Probability. The first maintains that this concept is probability, and cer- 
tainly not anything like probability,; the second puts it just the other 
way round. i 

It has repeatedly occurred in the history of science that a vehement but 
futile controversy arose between the proponents of two or more explicata 
who shared the erroneous belief that they had the same explicandum; 
when finally it became clear that they meant different explicanda, un- 
fortunately designated by the same term, and that the different explicata 
were hence compatible and moreover were found to be equally fruitful 
scientific concepts, the controversy evaporated into nothing. 


One of the outstanding examples is the controversy between the followers of 
Descartes and those of Leibniz concerning the concept of living force (‘vis viva’, 
also called ‘quantity of motion’). Both sides believed that it'was practically 
clear enough what was meant by the ‘living force’ of a moving body; both agreed 
that this magnitude increases with the mass and the velocity of the body. But 
they disagreed in their explications for this supposedly one explicandum. The 
first group proposed as explicatum mv, the product of mass and velocity; the 
second rejected this and proposed instead mz. It took a long time until it be- 
came clear that the two assertions of the disputants were not two incompatible 
answers to the same problem but two correct answers to two different prob- 
lems. Both concepts were recognized as fruitful and necessary for mechanics; 
the first is the magnitude now called momentum, the second (with the factor } 
attached to it) is now called kinetic energy. To the physicist of our time, 
familiar with both concepts, the historic dispute about the question which of 
the two concepts is “the right one” seems somewhat strange. As soon as we 
recognize the distinction between probability, and probability., the contem- 
porary controversy about probability will appear just as strange and futile. 
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The distinction between the two concepts which serve as explicanda is 
often overlooked on both sides. This is primarily due to the unfortunate 
fact that both concepts are designated by the same familiar but ambigu- 
ous word ‘probability’. Although many languages contain two words 
(e.g., English ‘probable’ and ‘likely’, Latin ‘probabilis’ and ‘verisimilis’, 
French ‘probable’ and ‘oraisemblable’), these words seem in most cases to 
be used in about the same way or at any rate not to correspond to the two 
concepts we have distinguished. Some authors (e.g., C. S. Peirce, R. A. 
Fisher, and Jeffreys) have suggested utilizing the plurality of available 
words for the distinction of certain concepts. We shall later (§ 60) use 
the term ‘likelihood’ in a certain special sense for which it was proposed 
by Jeffreys. For the two concepts probability, and probability., however, 
we shall simply make the distinction with the help of the subscripts. The 
terms ‘probability,’ and ‘probability,’ will chiefly be used in our discus- 
sions of explicanda, especially when we analyze customary formulations 
in prescientific language and the formulations of other authors. On the 
other hand, in the discussion of possible explicata we shall mostly use the 
terms ‘(degree of) confirmation’ and ‘relative frequency’. 

Probability;, the logical concept of probability as explicandum, has 
been explained in the preceding section and will later be analyzed in 
greater detail (§ 41). A few explanations may here be given for proba- 
bility,, just to make clear its distinction from probability,. The theory of. 
probability, itself lies outside the program of this book, which deals with 
inductive logic and therefore with probability,. A typical example of the 
use of the term ‘probability’ in the sense of probability, is the following 
statement: 

‘The probability of casting an ace with this die is 1/6.’ 


Statements of this form refer to two properties (or classes) of events: 
(i) the reference class K, here the class of the throws of this die; and (ii) 
the specific property M, here the property of being a throw with any die 
resulting in an ace. The statement says that the probability, of M with 
respect to K is 1/6. The statement is tested by statistical investigations. 
A sufficiently long series of, say, # throws of the die in question is made, 
and the number m of these throws which yield an ace is counted. If the 
relative frequency m/n of aces in this series is sufficiently close to 1/6, the 
statement is regarded as confirmed. Thus, the other way round, the state- 
ment is understood as predicting that the relative frequency of aces 
thrown with this die in a sufficiently long series will be about 1/6. This 
formulation is admittedly inexact; but it is only intended to indicate the 
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meaning of ‘probability,’ as an explicandum. To make this concept exact 
is the task of the explication. 

There are two schools of thought which take the frequency concept, 
probability, as explicandum. The first, usually referred to as that of the 
frequency theory of probability, takes as explicatum for this explicandum 
the limit of the relative frequency of M within an infinite sequence; in 
our example and similar ones, this sequence may consist of the events of 
the class K, here assumed to be infinite, in their temporal order. This ex- 
plication was first proposed by Venn ({Logic] (2d ed., 1876), chap. v, secs. 
36, 37; (3d ed., 1888), chap. vi, secs. 36, 37). It was systematically de- 
veloped by Mises and Reichenbach. Mises’ definition requires that the 
reference sequence exhibit a random order. This concept of randomness 
involves certain difficulties. It was originally defined in too strong a form 
which was found to lead to contradictions. A suitable redefinition avoid- 
ing the contradiction was proposed by Wald [Kollektiv]. The problems 
here involved are still under discussion. The second school is that of mod- 
ern mathematical statistics as developed by R. A. Fisher, J. Neyman, 
E. S. Pearson, and others in the course of the last decades. (For the publi- 
cations of the authors mentioned see the Bibliography; for technical sys- 
tematic expositions of the whole theory see Wilks [Statistics] and Cramér 
[Statistics]; Wald [Principles] gives a clear survey of the basic ideas and 
methods.) Here, ‘probability’ is taken as an undefined term in an axiomat- 
ic system. The reference class K, called the population, is not required to 
be denumerable as in the first school but may be a continuum; therefore 
the limit concept is not directly applicable. However, both the formulation 
of the axioms and the nonformal explanations of the term ‘probability’ 
make it clear that it is meant in the sense of relative frequency (see, e.g., 

` Fisher [Foundations], p- 312; Wilks [Statistics], pp. 3-6; Cramér [Statis- 
tics], pp. 148-51). [Some further remarks concerning the limit concept in 
connection with probability, will be made later, in § 106B.] 

It is clear that the concept of probability, involves statistics of mass 
phenomena and their frequencies. However, this is not a distinguishing 
characteristic of this concept. The same also frequently holds for proba- 
bility, in the sense that the evidence referred to in a probability, state- 
ment is, as we shall see (§ 44B), often of a statistical nature specifying, 
for instance, the frequency of a property in a given population or in a 
given sample taken from a population. 

We shall later (§ 42) come back to the discussion of the distinction be- 
tween probability, and probability, and, in particular, the problem of how 
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the word ‘probability’, which originally meant only probability,, came to 
be used also in the sense of probability. 


§ 10. The Logical Nature of the Two Probability Concepts 


Both probability, and probability, taken as quantitative concepts, are func- 
tions of two arguments, whose values are real numbers of the interval o-1. 
A. The arguments of probability, are sentences (or propositions expressed by 
them). Probability, has two arguments, the hypothesis and the evidence. A refer- 
ence to the latter is often omitted; but this omission leads sometimes to a neglect 
of the relativity of probability; and thereby to misconceptions. An elementary 
statement of probability, is not factual but L-determinate. B. The arguments 
of probability, are properties; its elementary statements are factual, empirical. 
However, the theorems of the theory of probability, state, not values of this 
function, but general relations between such values, and are L-true. Those au- 
thors who support a frequency theory of probability have clearly probability, as 
their explicandum. I believe that the explicandum of most of the others is prob- 
ability,, in spite of the variety of their explanations. 

On the basis of the pteceding explanations, let us now characterize the 
two probability concepts not with respect to what they mean but merely 
with respect to their logical nature, more specifically, with respect to the 
kind of entities to which they are applied and the logical nature of the 
simplest sentences in which they are used. [Since the prescientific use of 
the two concepts is often too vague and incomplete, e.g., because of the 
omission of the second argument (viz., the evidence or the reference 
class), we take here into consideration the more careful use by authors on 
probability. However, we shall be more concerned with their general dis- 
cussions than with the details of their constructed systems.] For the sake 
of simplicity, let us consider the two concepts in their quantitative forms 
only. They may be taken also in their comparative and in their classifica- 
tory forms (as explained for probability, i.e., confirmation, in § 8), and 
these other forms’ would show analogous differences. Probability, and 
probability,, taken as quantitative concepts, have the following charac- 
teristics in common: each of them is a function of two arguments; their 
values are real numbers belonging to the interval o-1. Their character- 


teristic differences will now be explained. 


A. Probability,, Degree of Confirmation 


1. The two arguments are variously described as events (in the literal 
sense, see below), states of affairs, circumstances, and the like. Therefore 
each argument is expressible by a sentence and, hence, is, in our terminol- , 
ogy, a proposition. Another alternative consists in taking as arguments 
the sentences expressing the propositions, describing the events, etc. We 
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shall choose this alternative and hence take probability, as a semantical 
concept (as in § 8). (Fundamentally it makes no great difference whether 
propositions or sentences are taken as arguments; but the second method 
has certain technical advantages which will be explained later, in § 52.) 

2. An elementary statement of probability, i.e., one which attributes to 
two given arguments a particular number as value of probability,, is 
either L-true (i.e., logically true, analytic) or L-false (i.e, logically false, 
logically self-contradictory), hence in any case L-determinate, not factual 
(synthetic). (For an explanation of the L-terms see § 20.) Therefore, a 
statement of this kind is to be established by logical (semantical) analysis 
alone, as has been explained earlier (§ 8). It is independent of the con- 
tingency of facts because it does not say anything about facts (although 
the two arguments do in general refer to facts). 

Many empiricist authors have rejected the logical concept of proba- 
bility, as distinguished from probability, because they believe that its use 
violates the principle of empiricism and that, therefore, probability, is 
the only concept admissible for empiricism and hence for science. One of 
the reasons given for this view is as follows. The concept of probability, is 
applied also in cases in which the hypothesis h is a prediction concerning 
a particular event, e.g., the prediction that it will rain tomorrow or that 
the next throw of this die will yield an ace. Some philosophers believe that 
an application of this kind violates the principle of verifiability (or con- 
firmability). They might say, for example: “How can the statement ‘the 
probability of rain tomorrow on the evidence of the given meteorological 
observations is one-fifth’ be verified? We shall observe either rain or not- 
rain tomorrow, but we shall not observe anything that can verify the value 
one-fifth.” This objection, however, is based on a misconception concern- 
ing the nature of the probability, statement. This statement does not 
ascribe the probability, value 1/5 to tomorrow’s rain but rather to a cer- 
tain logical relation between the prediction of rain and the meteorological 
Teport. Since the relation is logical, the statement is, if true, L-true; there- 
fore it is not in need of verification by observation of tomorrow’s weather 
or of any other facts. The situation may be clarified by a comparison with 
deductive logic. Let be the sentence ‘there will be rain tomorrow’ and j 
the sentence ‘there will be rain and wind tomorrow’. Suppose somebody 
makes the statement in deductive logic: ‘4 follows logically from j.’ Certain- 
ly nobody will accuse him of apriorism either for making the statement or 
for claiming that for its verification no factual knowledge is required. The 
statement ‘the probability, of 4 on the evidence e is 1/s’ has the same 
general character as the former statement; therefore it cannot violate 
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empiricism any more than the first. Both statements express a purely 
logical relation between two sentences. The difference between the two 
statements is merely this: while the first states a complete logical implica- 
tion, the second states only, so to speak, a partial logical’ implication; 
hence, while the first belongs to deductive logic, the second belongs to in- 
ductive logic. Generally speaking, the assertion of purely logical sentences, 
whether in deductive or in inductive logic, can never violate empiricism; 
if they are false, they violate the rules of logic. The principle of empiricism 
can be violated only by the assertion of a factual (synthetic) sentence 
without a sufficient empirica] foundation or by the thesis of apriorism 
when it contends that for knowledge with respect to certain factual sen- 
tences no empirical foundation is required. 

The fact that probability, is relative to given evidence and that therefore 
a complete statement of probability, must contain a reference to the evi- 
dence is very important. Keynes was the first to emphasize this relativity 
([Probab.], pp. 6 f.). The omission of any reference to evidence is often 
harmless if the elliptical nature of the statement is clearly recognized. 
However, this omission was the general custom with earlier authors, and 
it often caused lack of clarity. It had sometimes the effect that the authors 
overlooked the relativity of probability, and thus came to the belief that 
probability, was dependent upon our knowledge and that hence the valid- 
ity of a statement on probability, was merely subjective. At other times 
it led to the belief that this validity was dependent upon certain physical 
facts. I think that a certain fundamental discrepancy in Kries’s concep- 
tion of probability is perhaps to be explained by his neglect of the evidence 
as an essential argument to the concept of probability. On the one hand, 
he speaks of the probability of assumptions or expectations; he refers to 
“logical connections which, when some things are regarded as certain, 
constitute for other things a more or less great probability” (of a com- 
parative, not a quantitative, nature) ([Prinzipien], p. 26); this shows that 
he means probability,, not probability. (which he rejects explicitly, pp. 
18 ff.). On the other hand, he says that the probability sentences have an 
empirical content (e.g., p. 170). It seems to me that his neglect of the rela- 
tivity with respect to the evidence has led him to the mistake of ascribing 
the factual content of our knowledge, e.g., concerning the physical condi- 
tions of the way in which a die is thrown or a roulette is played, to the 
probability sentence itself instead of to the evidence. Reichenbach’s views 
that even those probability statements which concern what he calls 
“weight” or “the logical concept of probability” are of an empirical na- 
turé ([Experience] §§ 32-34; cf. our discussion below, in § 41E) is perhaps 
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to be explained in a similar way. He feels correctly that the statement 
‘the weight (or predictional value) of the prediction that it will rain to- 
morrow is 3/4’ must somehow be based upon our empirical knowledge, in 
particular, our observation of the present weather situation and statistical 
results concerning past weather observations, especially the relative fre- 
quency with which rain has been observed to follow upon weather situa- 
tions similar to that of today. This leads him to the conception that the 
statement about the weight 3/4 must itself be interpreted as a statement 
concerning the observed relative frequency 3/4 and hence as being itself 
a factual, empirical statement. Our conception agrees with Reichenbach’s 
with respect to the point that the value of the weight, our probability,, is 
based on the observed relative frequency; but we regard the weight state- 
ment as elliptical. The relevant empirical knowledge, including the obser- 
vation of the present state of the weather and the past results, especially 
the observed relative frequency, is to be expressed in the evidence e; and 
the complete formulation of the weight statement is a statement on prob- 
ability, which does not contain e and is not derived from e but instead con- 
tains a reference to e. Thus our empirical knowledge does not constitute 
a part of the content of the probability, statement (which would make this 
statement empirical)-but rather of the sentence e which is dealt with in 
the probability, statement. Thus the latter, although referring to empiri- 
cal knowledge, remains itself purely logical. 

Many writers on probability formulate a reference to the evidence in 
the form of a conditional clause, for example: 


(1) ‘If an urn contains a hundred balls, of which seventy are white and 
and thirty black, then the probability that the next ball drawn from 
this urn will be white is 0.7’. 


This formulation is preferable to the elliptical one because the danger of 
overlooking the evidence is here avoided. But it is not quite correct. In a 
genuine conditional sentence of the form ‘If A, then B’, for example: 


(2) ‘If it rains, Jack will not come’, 


the main clause B is a meaningful sentence, complete in itself (if we leave 
aside cases involving pronouns). On the other hand, the main clause in 
the probability statement (1) ‘the probability that the next ball drawn 
from this urn will be white is 0.7’ is incomplete because a reference to evi- 
dence is lacking. Many authors, even among the best contemporary writ- 
ers on probability,, have sometimes used the conditional formulation of 
probability statements. In some cases they have been misled by this form 
to the view that the evidence expressed in the conditional clause, if known, 
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provides a ground or premise from which the probability expressed in the 
main clause can be inferred. This view is based on a false analogy due to 
the incorrect conditional formulation. If a genuine conditional statement 
‘If A, then B’ is known and ‘A’ is supplied as additional information, ‘B’ 
can indeed be inferred. Thus there is the temptation to proceed analogous- 
ly with (1). Suppose that (1) is known as an instance of a general theorem 
on probability and that we are given the information that this particular 
urn contains seventy white and thirty black balls; then we might be in- 
clined to say that we can infer from this information that the probability 
of drawing a white ball is 0.7. However, this alleged conclusion is incom- 
plete and hence, strictly speaking, meaningless. If we wish to use the word 
inference’, as is customary, in a wider sense than it has in deductive logic 
so that we can speak of ‘nondeductive’ or ‘nondemonstrative inference’ 
or positively of ‘probability inference’ or ‘inductive inference’ (§ 44B), we 
may say that the hypothesis + is inductively inferred from the evidence e. 
But in this case we must be careful not to overlook the fact that the 
probability value characterizes not the hypothesis (‘the next ball will be 
white’) but rather the inference from the evidence to the hypothesis or, 
more correctly speaking, the logical relation holding between the evidence 
and the hypothesis. 

Thus we see that from the evidence e together with the statement ‘the 
probability of Æ with respect to e is 1/5’ we can infer (in the strict sense 
of this word) neither + itself, which may be false, nor a statement of the 
probability of #, which would be meaningless. In fact, nothing can be in- 
ferred from those two premises (except, trivially; for those conclusions 
which follow from e alone). This negative answer to an often discussed 
problem will become clear in the course of the later development of our 


theory. 


B. Probability, Relative Frequency 

1. The two arguments are properties, kinds, classes, usually of events 
or things. [As an alternative, the predicate expressions designating the 
properties might be taken as arguments. In the present case, however, in 
distinction to (x), there does not seem to be any advantage in this method; 
cf. § 52. 

A Be elementary statement of probability, is factual and empirical; it 
says something about the facts of nature and hence must be based upon 
empirical procedure, the observation of relevant facts. From these ele- 
mentary statements the theorems of a mathematical theory of probability, 
must be clearly distinguished. The latter do not state.a particular value 
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of probability, but say something about connections between probability, 
values in a general way, usually in a conditional form, for example: ‘if the 
values of such and such probabilities, are g, and qa, then the value of a 
probability, related to the original ones in a certain way is such and such 
a function, say, product or sum, of g, and q:.’ These theorems are not fac- 
tual but L-true (analytic). Thus a theory of probability,, e.g., the system 
constructed by Mises or that by Reichenbach is not of an empirical but of 
a logicomathematical nature; it isa branch of mathematics, fundamental- 
ly different from any branch of empirical science, e.g., physics. 

Mises has repeatedly stated (e.g., [Comments 1], p. 45) that his theory of 
probability is empirical, is a branch of the natural sciences like physics. How- 
ever, his theorems, although referring to mass phenomena, are quite obviously 
purely analytic; the proofs of these theorems (in distinction to examples of ap- 
plication) make use only of logicomathematical methods, in addition to his 
definition of ‘probability’, and not of any observational results concerning mass 
phenomena. Therefore his theory belongs to pure mathematics, not to physics. 


This point has been discussed in detail and completely clarified by F. Wais- 
mann [Wahrsch.], pp. 239 f. 


We shall sometimes call probability., in distinction to probability,, an 
empirical concept. This is not to be understood as saying that its definition 
refers to nonlogical concepts, which is obviously not the case, but merely 
as saying that its ordinary application, that is, its application to factual 
properties as arguments, is to be formulated in factual, empirical state- 
ments; in other words, the determination of its values in ordinary cases 
is an empirical procedure. Probability, is in this respect similar to the 
concept of the cardinal number of a property. The definition of the latter 
concept is likewise purely logical; nevertheless, its application to factual 
properties leads to factual, empirical statements, and its values in these 
cases are found by the empirical procedure of counting. 

In spite of the fundamental difference between the concepts of proba- 
bility, and probability,, many theorems concerning these concepts show 
a striking analogy. Later discussions will throw some light from various 
angles on the basis of this analogy. We shall see that in certain cases prob- 
ability, may be interpreted as an estimate of relative frequency or prob- 
ability. (§ 41D). Later, on the basis of an analysis of sentences with the 
help of their ranges, it will be seen that probability, can likewise be 're- 
garded as a ratio of the measures of two classes (see § 55B); but there 
Temains the important difference that in this case the ratio is determined 
in a purely logical way, while in the case of probability, it is determined 
empirically. 

A terminological remark concerning the word ‘event’ seems required in 
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view of (Ar) and (Br). It is very important to distinguish clearly between 
kinds of events (war, birth, death, throw of a die, throw of this die, throw 
of this die yielding an ace, etc.) and events (Caesar’s death, the throw of 
this die made yesterday at 10:00 A.M., the series of all throws of this die 
past and future). This distinction is particularly important for discussions 
on probability, because one of the characteristic differences between the 
two probability concepts is this: the first concept refers sometimes to two 
events, the second to two kinds of events (see Ar and B1). Many authors 
of probability use the word ‘event’ (or the corresponding words ‘Ereignis’ 
and ‘événement’) when they mean to speak, not about events, but about 
kinds of events. This usage is of long standing in the literature on prob- 
ability, but it is very unfortunate. It has only served to reinforce the cus- 
tomary neglect of the fundamental difference between the two probability 
concepts, which arose originally out of the ambiguous use of the word 
‘probability’, and thereby to increase the general confusion in discussions 
on probability. The authors who use the term ‘event’ when they mean 
kinds of events get into trouble, of course, whenever they want to speak 
about specific events. The traditional solution is to say ‘the happenings 
(or occurrences) of a certain event’ instead of ‘the events of a certain 
kind’; sometimes the events are referred to by the term ‘single event’. 
But this phrase is rather misleading; the important difference between 
events and kinds of events is not the same as the inessential difference 
between single events (the first throw made today with this die) and mul- 
tiple or compound events (the series of all throws made with this die). 
Keynes, if I interpret him correctly, has noticed the ambiguity of the 
term ‘event’. He says ([Probab.], p. 5) that the customary use of phrases 
like ‘the happening of events’ is “vague and unambiguous”, which I sup- 
pose to be a misprint for “vague and ambiguous”; but he does not specify 
the ambiguity. He proposes to dispense altogether with the term ‘event’ 
and to use instead the term ‘proposition’. Subsequent authors dealing with 
probability;, like Jeffreys, for example, have followed him in this use. 
Many authors have made a distinction between two (or sometimes 
more) kinds of probability. Some of these distinctions are quite different 
from the distinction made here between probability, and probability,. For 
instance, a distinction is sometimes made between mathematical proba- 
bility and philosophical probability; their characteristic difference ap- 
pears to be that the first has numerical values, the second not. However, 
this difference seems hardly essential; we find both a concept with numeri- 
cal values and one without, in other words, both a quantitative and a com- 
parative concept on either side of our distinction between the two funda- 
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mentally different meanings of ‘probability’. Another distinction has been 
made between subjective and objective probability. However, I believe 
that practically all authors really have an objective concept of probability 
in mind and that the appearance of subjectivist conceptions is in most 
cases caused only by occasional unfortunate formulations; this will be 
discussed soon (§ 12). 

Other distinctions which have been made are more or less similar to 
our distinction between probability, and probability,. For instance, F. P. 
Ramsey ([Foundations] (1926), p. 157) says: “... the general difference 
of opinion between statisticians who for the most part adopt the fre- 
quency theory of probability and logicians who mostly reject it renders 
it likely that the two schools are really discussing different things, and 
that the word ‘probability’ is used by logicians in one sense and by statis- 
ticians in another”. 

Tt seems to me that practically all authors on probability have meant 
either probability, or probability, as their explicandum, despite the fact 
that their various explanations appear to refer to a number of quite differ- 
ent concepts. 

For one group of authors, the question of their explicandum is easily 
answered. In the case of all those who support a frequency theory of prob- 
ability, i.e., who define their explicata in terms of relative frequency (e.g. 
as a limit or in some other way), there can be no doubt that their expli- 
candum is probability,. Their formulations are, in general, presented in 
clear and unambiguous terms. Often they state explicitly that their expli- 
candum is relative frequency. And even in the cases where this is not done, 
the discussion of their explicata leaves no doubt as to what is meant as 
explicandum. , 

This, however, covers only one of the various conceptions, i.e., explicata 
proposed, and only one of the many different explanations of explicanda 
which have been given and of which some examples were mentioned 
earlier. It seems clear that the other explanations do not refer to the sta- 
tistical, empirical concept of relative frequency, and I believe that prac- 
tically all of them, in spite of their apparent dissimilarity, are intended to 
refer to probability,. Unfortunately, many of the phrases used are more 
misleading than helpful in our efforts to find out what their authors ac- 
tually meant as explicandum. There is, in particular, one point on which 
many authors in discussions on probability,, or on logical problems in 
general, commit a certain typical confusion or adopt incautiously other 
authors’ formulations which are infected by this confusion. I am referring 
to what is sometimes called psychologism in logic. This will be discussed 
in the next two sections. 7 


° 
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§ 11. Psychologism in Deductive Logic 


Logical relations, e.g., logical consequence, are (i) logical, i.e., nonfactual, 
based merely upon meanings, (ii) objective, i.e., not dependent upon anybody’s 
thinking about them. Most logicians treat them within their systems as objec- 
tive relations, but, in spite of this, many characterize them in their general , 
preliminary remarks in subjectivistic terms, e.g., with reference to actual think- 
ing or believing. We call this discrepancy primitive psychologism in (deductive) 
logic. A qualified psychologism refers, not to actual, but to correct or rational 
thinking. This is usually meant in an objectivistic sense; in this case, the refer- 
ence to thinking is gratuitous. 


Those who work in the history of science or the methodology of science 
are familiar with the fact that there is frequently a discrepancy between 
what an author actually does and what he says he does; in particular, be- 
tween the sense in which he actually uses a term or a sentence and the 
sense which he explicitly attributes to it. This holds especially for ab- 
stract terms and general principles. Consequently, in order to find out 
which sense a certain term has for the author, it is often not sufficient to 
look at his explicit explanations. We should also examine how he uses the 
term and, especially, how he argues pro or con statements in which the 
term occurs. And if these two tests are not in good agreement, the latter 
is more reliable than the first; it gives a better indication of the actual 
sense of the term for the author, that is, his general habit of using it. Sup- 
pose, for instance, we wish to know what a certain historian or political 
scientist means by ‘democracy’. The best way is to observe under what 
conditions he applies this term and, still more important, what reasons 
he gives for these applications; we can accelerate the procedure by asking 
him questions as to whether and why he would apply the term to a coun- 
try whose form of government was such and such. Of course, the direct 
way of asking: “What do you mean by ‘democracy’?” is much simpler 
and quicker, and in many cases it will do. But there is always the danger 
that, instead of defining his actual meaning, he will give a definition which 
he has read in a theoretical book by a political scientist or even by a phi- 
losopher. 

The discrepancy here discussed is likewise found in exact fields. Frege 
has repeatedly shown (especially in his Uber die Zahlen des Herrn H. 
Schubert [Jena, 1899]) that the definitions of ‘number’ given by some 
mathematicians are deplorably inadequate and would lead to absurd 
and never intended applications, while the actual use of the term in the 
construction of a theory of numbers is quite correct. 

The discrepancy discussed takes a special form in the case of logic. Be- 
fore we approach the logical concept of probability:, which is one of the 
fundamental concepts of inductive logic, let us look at the older and more 
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familiar field of deductive logic, logic in the narrower sense. The task of 
logic (in this sense) has been the same for Aristotle as for modern (sym- 
bolic) logic, although the form of the systems constructed for the solution 
of this task has undergone considerable change in the course of the de- 
velopment. The task is the establishment of certain relations between sen- 
tences (or the propositions expressed by the sentences) usually called logi- 
cal relations, among them, as one of the fundamental concepts of logic, 
the relation of logical consequence or deducibility. We cannot give here a 
full and exact characterization of these relations but will only indicate 
some of their characteristics. (i) They are independent of the contingency 
of the facts of nature, hence formal (in the traditional, not the syntactical, 
sense; see [Semantics], p. 232, meaning II); consequently, for ascertaining 
one of these relations in a concrete case, we need only know the meanings 
of the sentences involved, not their truth-values. (ii) The relations are 
objective, not subjective, in this sense: whether one of these relations does 
or does not hold in a concrete case is not dependent upon whether or what 
any person may happen to imagine, think, believe, or know about these 
sentences, As an example, let i be the sentence ‘all swans are white’, and 
j be ‘all nonwhite things are nonswans’, and suppose we have come to an 
agreement as to the meaning of all terms occurring. Suppose that a per- 
son X believes at the present time that j is a logical consequence of i, while 
at an earlier time he believed that this was not the case. That the rela- 
tion is objective is meant in this sense: the change in X’s belief about the 
relation has no effect upon the status of the relation itself; if his present 
belief is right (as I think it is), then his former belief was wrong; and, 
if his former belief was right, his present belief is wrong. It does not 
even make sense to assume that each of the two beliefs was right at its 
time, i.e., that the relation of logical consequence holds now between 
the two sentences but did not hold at the former time; this relation 
is timeless, i.e., it has no time value as argument. I hope that nobody 
will misinterpret my statement of the objectivity of logical relations 
as a metaphysical statement of the “subsistence” of these relations in 
a Platonic heaven (as earlier statements of mine have been misinter- 
preted). The statement is intended merely to point out the following 
character which logical concepts share with physical concepts—from 
which they are fundamentally different in other respects: a sentence which 
ascribes one of these concepts in a concrete case (e.g., ‘j is a consequence 
of 7’, like ‘this stone is heavier than that’) is complete without any refer- 
ence to the properties or the behavior of any person. (This is not in con- 
tradiction to the obvious fact that the recognition of a logical or a physical 
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or any other kind of relation involves a person.) In distinction to logical 
and physical concepts, certain other concepts are subjective in this sense: 
their application requires a reference to.a person or a kind of person; e.g., 
‘known’, ‘familiar’, ‘pleasant’, ‘confirmed’ (in the pragmatical sense as 
distinguished from the semantical sense, in which we take the term in our 
discussions here, see § 8). For example, ‘this pattern is familiar’ is not a 
complete sentence; it must be supplemented by something like ‘to me’, 
‘to Mr. X’, ‘to the persons of such and such a class’. 

This objectivist conception of logic (in this section always understood 
in the sense of deductive logic), the view that the concepts of logic and 
hence the principles and theorems of logic which employ these concepts 
are objective, is certainly not new. On the contrary, it characterizes the 
work of practically all logicians. When they lay down their principles and 
rules or, on this basis, solve a logical problem, they do so in objectivist 
formulations, from Aristotle on through the Aristotelian tradition, up to 
modern logic. They say, for instance, ‘from premises of the form so and so, 
a conclusion of the form so and so follows’, or ‘. . . is deducible’, or ‘the 
deduction (inference) of . . . from . . . is valid’, or the like. Here, for the 
work within their systems, they would hardly ever use subjectivist for- 
mulations, that is, those referring to persons, for instance, ‘such and 
such an inference is valid for me now’, or “. . . valid for persons of an in- 
troverted type’. And, in order to find out whether a certain conclusion 
follows from given premises, they do not in fact make psychological ex- 
periments about the thinking habits of people but rather analyze the given 
sentences and show their conceptual relations. However, if we examine 
not their actual procedure in solving logical problems but their general 
remarks concerning the task and nature of logic, chiefly in the introduc- 
tory sections of their books, we often find something entirely different. 
Here, logic is often characterized as the art of thinking, and the principles 
of logic are called principles or laws of thought. These and similar formula- 
tions refer to thinking and hence are of a subjectivist nature. These ref- 
erences to thinking are in most cases entirely out of tune with what the 
same author does in the body of his work. Thus we have here a special 
case of the discrepancy discussed in the beginning of this section. A dis- 
crepancy of this kind, where the problems themselves are of an objective 
nature but the descriptions by-which the author intends to give a general 
characterization of the problems are framed in subjectivist, psychological 
terms (like ‘thinking’), is often called psychologism. Thus formulations of 
the kind mentioned above, frequently occurring in books on logic, are in- 
stances of psychologism in deductive logic. In some cases we find a situation 
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still worse than that just described. It happens sometimes that the author 
does not only mislead the readers by his psychologistic general remarks 
but misleads himself; in this case, we find traces of subjectivism in the 
logical system itself, in the discussion of the logical problems, mixed with 
objective logical components; the result is inevitably rather confusing. 
[The situation is entirely different in cases where not only the general 
characterization but also the discussion of the problems themselves is 
consistently subjectivistic. A procedure of this kind, even if its author 
applies to it the title ‘Logic’, cannot be criticized as psychologism, because 
there is no mixture of heterogeneous components; there is merely a termi- 
nological difference in the use of the term ‘logic’. It seems to me that John 
Dewey’s Logic, the theory of inquiry (New York, 1938) is an instance of 
this kind. This book deals with that kind of behavior which is appropriate 
to problematic situations and leads to their “solutions”; it does not deal 
with logic in our sense (except in a few sections which seem somewhat out 
of place and have little connection with the remainder of the book), The 
fact that many logicians, that is, men who work in the field of logic in our 
sense, have erroneously characterized this field as the art of thinking has 
caused Dewey, who actually works on the art of thinking, that is, the 
theory and technology of procedures for overcoming problematic situa- 
tions, to choose the title ‘Logic’.] 

We find psychologism in deductive logic not only in the literature of 
traditional logic but also in that of modern logic. A conspicuous example 
is the title of the book which may be regarded as marking the beginning 
of modern symbolic logic, Boole’s Laws of thought. But one of the impor- 
tant achievements in the development of modern logic has been the 
gradual elimination of psychologism and the gradual clarification of the 
nature of logic. It seems that the great majority of contemporary writers 
in modern logic—though not those in logic of the traditional style—are 
free of psychologism. This is chiefly due to the efforts of the mathemati- 
cian, Gottlob Frege, and the philosopher, Edmund Husserl, who empha- 
sized the necessity of a clear distinction between empirical psychological 
problems and nonempirical logical problems and pointed out the confu- 
sion caused by psychologism. In this respect, they have also influenced 
indirectly the attitude of many logicians who have never read their works. 

For Frege’s emphasis on the objectivity of logic and arithmetic and his re- 
jection of psychologism see his Grundlagen der Arithmetik (1884), §§ 26, 27, and 
Grundgesetze der Arithmetik, Vol. I (1893), Preface, pp. xiv ff. Husserl’s own 
position was originally psychologistic (Philosophie der Arithmetik [1891]); but 
later, under the influence of Frege, he became one of the prominent opponents 
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of psychologism (Logische Untersuchungen, Vol. I [1900], Preface and chaps. 
3-11). Concerning this development of Husser!’s views cf. Marvin Farber, The 
foundation of phenomenology (1943). 


A primitive psychologistic explanation of the relation of logical conse- 
quence would perhaps be somewhat like this. That j is a logical conse- 
quence of imeans that, if somebody believes in 7, he cannot help believing 
also in j. Now, in fact, a psychologistic explanation will hardly ever be 
given in this crude form, because its inadequacy is too obvious. Taken 
literally, the explanation given would require us to investigate the statisti- 
cal results of series of psychological experiments. There are not many 
logicians who would regard this procedure as appropriate. 

A nice illustration, though uot meant quite seriously, of primitive psycholo- 
gism in arithmetic—which is part of deductive logic—is the following passage 
by P. E. B. Jourdain (The philosophy of Mr. B*rtr*nd R*ss*ll [1918], p. 88, 
quoted by Jeffreys [Probab.], p. 37): “I sometimes feel inclined to apply the 
historical method to the multiplication table. I should make a statistical inquiry 
among school children, before their pristine wisdom had been biased by teach- 
ers. I should put down their answers as to what 6 times 9 amounts to, I should 
work out the average of their answers to six places of decimals, and should then 
decide that, at the present stage of human development, this average is the 
value of 6 times 9.” 


Many logicians prefer formulations which may be regarded as a kind of 
qualified psychologism. They admit that logic is not concerned with the 
actual processes of believing, thinking, inferring, because then it would 
become a part of psychology. But, still clinging to the belief that there 
must somehow be a close relation between logic and thinking, they say 
that logic is concerned with correct or rational thinking. Thus they might 
explain the relation of logical consequence as meaning: ‘if somebody has 
sufficient reasons to believe in the premise 7, then the same reasons justify 
likewise his belief in the conclusion j’. It seems to me that psychologism 
thus diluted has virtually lost its content; the word ‘thinking’ or ‘believ- 
ing’ is still there, but its use seems gratuitous. The explanation of logical 
consequence just mentioned does not say more than a formulation in 
nonpsychologistic, objectivist terms, for instance: ‘any evidence for 7 is 
also evidence for j’; or: ‘if ż is true, then j is necessarily also true’ (where 
‘necessarily’ means not more than ‘in any possible case, no matter what 
the facts happen to be’); indeed, we might say that the formulation in 
terms of justified belief is derivable from this one. Hence that formulation 
is not wrong. The characterization of logic in terms of correct or rational 
or justified belief is just as right but not more enlightening than to say 
that mineralogy tells us how to think correctly about minerals. The refer- 
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ence to thinking may just as well be dropped in both cases. Then we say 
simply: mineralogy makes statements about minerals, and logic makes 
statements about logical relations. The activity in any field of knowledge 
involves, of course, thinking. But this does not mean that thinking be- 
longs to the subject matter of all fields. It belongs to the subject matter 
of psychology but not to that of logic any more than to that of mineralogy. 

Because of the frequent discrepancy between introductory general re- 
marks and the actual working theory of an author, we ought to be cau- 
tious in judging the latter on the basis of the former. The fact that an au- 
thor uses occasionally some psychologistic formulations in general re- 
marks about the task of logic, or in preliminary explanations of the mean- 
ing of some fundamental terms in logic, is not a sufficient reason for assum- 
ing that he has a subjectivistic conception of logic. If those explanations 
are in terms of correct or rational or justified thinking rather than of ac- 
tual thinking, then in most cases they are not even subjectivistic, The ref- 
erence to correctness or justification is presumably meant in the sense of 
‘in accordance with the rules of logic’; and these rules are regarded as ob- 
jective by most logicians. The decisive point to examine is the way in 
which an author solves his logical problems, demonstrates logical theo- 
rems. If here his procedure is objectivistic, that is, free from references to 
the features of actual processes of thinking, then we have to regard his 
logic as objectivistic. This holds even if we find in his general remarks for- 
mulations not only of qualified but of primitive psychologism. If his work- 
ing procedure is objectivistic, his occasional psychologistic formulations 
should be regarded as inessential relics from a traditional way of speech 
rather than as characteristics of his system of logic. 

This view concerning the interpretation of psychologistic formulations 
in deductive logic, where the situation is relatively simple, will help us in 
understanding the analogous situation in the field of inductive logic, where 
the situation is at the present time much less clear, 


§ 12. Psychologism in Inductive Logic 


The situation with respect to psychologism in inductive logic, i.e., in the the- 
ory of probability, is analogous to that in deductive logic. We analyze here the 
formulations of some authors in two groups. A. Those who characterize proba- 
bility as a logical relation similar to logical consequence (e.g., Keynes, Jeffreys). 
Here we find the systems themselves thoroughly objectivistic, but some general 
remarks show qualified psychologism, e.g., explanations of probability as de- 
gree of reasonable or justified belief; the concept meant is clearly probability;. 
B. Authors of the classical theory of probability (e.g., Bernoulli, Laplace). Here, 
we find, in addition, formulations of primitive psychologism, e.g., explanations 
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of probability as degree of belief or expectation. Nevertheless, it seems to me 
that their theories themselves were objectivistic; and, further, that they meant 
in most cases probability;, not probability2. 


A. Probability as a Logical Relation 


Deductive logic may be regarded as the theory of the relation of logical 
consequence, and inductive logic as the theory of another concept which 
is likewise objective and logical, viz., probability, or degree of confirma- 
tion. That probability, is an objective concept means this: if a certain 
probability, value holds for a certain hypothesis with respect to a certain 
evidence, then this value is entirely independent of what any person may 
happen to think about these sentences, just as the relation of logical con- 
sequence is independent in this respect. Consequently, a definition of an 
explicatum for probability, must not refer to any person and his beliefs 
but only to the two sentences and their logical properties within a given 
language system. 

Now we shall show that the situation with respect to psychologism in 
inductive logic is in all essential respects analogous to that in deductive 
logic as discussed in the preceding section. 

We have previously (§ 9) classified the theories of probability in three 
groups. In one of these groups the frequency conception of probability is 
adopted; here, the explicandum is obviously probability,. The other two 
conceptions are the classical one (Bernoulli, Laplace) and the conception 
of probability as a logical concept related to deducibility (Keynes, 
Jeffreys). 

Our problem is to discover what is the explicandum for the various au- 
thors of these two remaining groups. Let us begin with the last-mentioned 
group. Here, it will be easy to see that the explicandum is the objective, 
logical concept of probability;. But even here we shall find psychologistic 
formulations. This fact will help us later in the analysis of classical au- 
thors to look through the deceiving shell of psychologistic formulations to 
the objectivistic core of their conception. 

Keynes makes it quite clear that he regards probability as an objective, 
logical concept: “In the sense important to logic, probability is not sub- 
jective. It is not, that is to say, subject to human caprice. A proposition is 
not probable because we think so. When once the facts are given which 
determine our knowledge, what is probable or improbable in these circum- 
stances has been fixed objectively, and is independent of our opinion. The 
Theory of Probability is logical, therefore” ({Probab.], p. 4). Keynes ad- 
mits that probability may also be called subjective in another sense; it 
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seems to me that here the term ‘relative’, in the sense of ‘relating to a 
second proposition as evidence’, would be more appropriate. He says (p. 4, 
in a passage immediately preceding the above quotation): “A proposition 
is capable at the same time of varying degrees of this relationship [of 
probability], depending upon the knowledge to which it is related, so 
that it is without significance to call a proposition probable unless we spec- 
ify the knowledge to which we are relating it. To this extent, therefore, 
probability may be called subjective. But in the sense ...”. Then the 
preceding quotation follows, which makes it clear that Keynes’s concept is 
in no respect meant as subjective in the sense opposite to objective. 

Now it is interesting to see that Keynes, immediately following the 
passage quoted above in which he explicitly emphasizes the objective, 
logical nature of his concept, uses formulations of the kind which we have 
previously called qualified psychologism. He says: “The Theory of Prob- 
ability is logical, therefore, because it is concerned with the degree of be- 
lief which it is rational to entertain in given conditions, and not merely 
with the actual beliefs of particular individuals, which may or may not be 
rational” (p. 4, italics in the original). His explicit contrasting of rational 
versus actual degree of belief and the use of ‘because’ show clearly that 
the reference to beliefs is not intended to modify in any way the charac- 
terization of the concept as a logical one or to bring in a subjective com- 
ponent. This will make us hesitant to interpret similar formulations of 
other authors as genuine symptoms of a subjectivistic conception. The 
situation here is analogous to that in deductive logic. Suppose that the 
hypothesis / has the probability, g with respect to the evidence e. Then, 
indeed, it follows that if somebody knows e and nothing else, he is justi- 
fied in believing in h to the degree g and likewise justified in acting accord- 
ingly, e.g., in betting on h with g against 1 — q. But this reference to be- 
lief should be avoided in a characterization of probability,, because it 
blurs the important boundary line between logical and psychological con- 
cepts. Of course, in incidental informal explanations of probability,, ref- 
erences to believing and betting will often facilitate the understanding— 
as in analogous cases in deductive logic and mathematics—but care should 
be taken that these references to something extra-logical do not obscure 
the nature of probability, as a purely logical concept. i 

That the objective logical concept meant by Keynes`is the same as 
what we call probability,, i.e., the logical concept of confirmation, becomes 
quite clear both by numerous preliminary explanations and by his reason- 
ings in the construction of his system. He says, for instance: ‘‘. . . a logical 
connection between one set of propositions which we call our evidence 
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and which we suppose ourselves to know, and another set which we call 
our conclusions, and to which we attach more or less weight according to 
the grounds supplied by the first” (p..5 f.). Keynes takes the concept in 
general as nonquantitative, similar to our comparative concept of con- 
firmation; only in special cases does his theory allow the attribution of 
numerical values like our quantitative concept of degree of confirmation. 

It is true, some statements of Keynes concerning his concept of proba- 
bility are not in agreement with our conception of probability,. He says, 
for example: “A definition of probability is not possible. . . . We cannot 
analyze the probability-relation in terms of simpler ideas” (p. 8); later he 
speaks of “a faculty of direct recognition of many relations of probability” 
(p. 53) by a kind of “logical intuition” (p. 52). But I do not think that this 
is evidence against our interpretation of his concept in the sense of our 
probability,. It is one question whether two persons mean the same by 
certain terms and quite another question whether or not they agree in 
their opinions concerning the thing meant. 3 

With other representatives of this group the situation is on the whole 
similar. We see easily from their systematic constructions and often also 
from explicit explanations that their explicandum is an objective, logical 
concept and, more specifically, that it is probability, Often, but not al- 
ways, we find also psychologistic formulations, mostly of the qualified 
form. For the reasons earlier discussed, we do not regard these formula- 
tions as symptoms of a genuinely subjectivist conception but merely as 
vestiges of an old tradition that has been overcome in substance but still 
lingers on in some forms of speech. 

The general remarks just made may be illustrated by some brief ref- 
erences to some authors of this group. 

That Jeffreys understands ‘probability’ in the sense of probability, be- 
comes abundantly clear through his whole theory. The very first sentence 
of the preface of his chief work ((Probab.], p. v) describes his aim “to pro- 
vide a method of drawing inferences from observational data”. He begins 
with a comparative concept with three arguments (“on data p, g is more 
probable than 7”, p. 15), from which he develops a quantitative concept 
by suitable conventions (p. 19). The whole conception is thoroughly ob- 
jectivistic but accompanied by occasional formulations of qualified psy- 
chologism, e.g., “The probability, strictly, is the reasonable degree of con- 
fidence” (p. 20), “reasonable degree of belief” (p. 31). 

F. P. Ramsey's conception of probability seems at first inspection more 
psychological and subjectivistic than the conception of most of the other 
authors ({Truth] and [Considerations], both published in [Foundations]; 
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my references are to the latter book). He says that the theory of proba- 
bility is “the logic of partial belief” (pp. 159, 166); “we must therefore 
try to develop a purely psychological method of measuring belief” (p. 
166); “I propose to take as a basis a general psychological theory” 
(p. 173). Thus it is not surprising that many authors have judged Ram- 
sey’s conception as a particularly clear case of subjectivism. However, it 
seems to me that a closer examination is apt to evoke serious doubts 
about this judgment. It is true that the psychological method of measur- 
ing the actual degree of belief of a person in a proposition plays a central 
role in Ramsey’s discussion. But he does not define probability as or 
identify it with actual degree of belief. He says: “It is not enough to meas- 
ure probability; in order to apportion correctly our belief to the probability 
we must also be able to measure our belief”; “if the phrase ‘a belief two- 
thirds of certainty’ is meaningless, a calculus [viz., the theory of proba- 
bility] whose sole object is to enjoin such beliefs will be meaningless also” 
(both on p. 166; the italics are mine). Thus, he regards the theory of 
probability not as a part of psychology describing the actually occurring 
degrees of belief but rather as a part of logic giving standards or norms 
which tell us which degrees of belief we should entertain if we want to be 
rational and consistent in our beliefs. This interpretation seems confirmed 
by his statement that “the laws of probability are laws of consistency, an 
extension to partial beliefs of formal logic, the logic of consistency” (p. 
182); “having degrees of belief obeying the laws of probability implies a 
further measure of consistency, namely such a consistency between the 
odds acceptable on different propositions as shall prevent a book being 
made against you”. This shows that the standard imposed upon our be- 
liefs by the theory of probability is regarded as an objective one, viz., 
avoiding certain unfavorable results in betting. Later (p. 191) he charac- 
terizes logic “as the science of rational thought. We found”, he continues, 
“that the most generally accepted parts of logic, namely, formal logic, 
mathematics, and the calculus of probabilities, are all concerned simply 
to ensure that our beliefs are not self-contradictory”. This conception of 
the nature of logic as normative for, rather than descriptive of, beliefs is 
clearly expressed in the following words: “Logic, we may agree, is con- 
cerned not with what men actually believe, but what they ought to be- 
lieve, or what it would be reasonable to believe” (p. 193). This formula- 
tion must clearly be judged as qualified rather than primitive psycholo- 
gism. Therefore our previous consideration that the step from primitive 
to qualified psychologism shows an underlying objectivist conception ap- 
plies also to Ramsey. This judgment seems confirmed by Ramsey’s own 
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later remark (written in 1929) concerning his earlier paper ([Truth], writ- 
ten in 1926): “The defect of my paper on probability was that it took 
partial belief as a psychological phenomenon to be defined and measured 
by a psychologist” (p. 256). , 

One of the rare cases in which primitive psychologism with respect to 
probability is meant literally is to be found in James Jeans’s discussion 
of the probability waves in quantum mechanics (Physics and philosophy 
[New York, 1943]). We may leave aside here the question as to whether 
the concept of probability used in quantum theory is to be understood in 
the sense of.probability, or of probability,; maybe formulations of both 
kinds are possible. At any rate, both concepts are objective; the applica- 
tion of the one is a matter of logic, that of the other a matter of physics; 
neither of them is a psychological concept. Jeans, however, believes that 
probability in quantum theory is something of a mental nature. Hence 
he comes to the conclusion that Dirac’s waves of probability are waves of 
knowledge; “the final picture consists wholly of waves, and its ingredients 
are wholly mental constructs”. Consequently, he sees in this development 
of physics “a pronounced step in the direction of mentalism”. 


B. The Classical Theory of Probability 


Now let us see to what extent psychologism is to be found in the so- 
called classical conception of probability, as originated by Jacob Bernoulli 
and Laplace. This conception shows itself in the definition of probability 
and in the way in which this definition is used; in other words, in the ex- 
plicatum of these authors and their followers. Here, however, we shall not 
discuss their explicatum but their explicandum. We find many psycholo- 
gistic formulations; probability is explained, for instance, as degree of be- 
lief, degree of certainty, and the like. Therefore, many later writers have 
characterized the classical conception as subjectivistic. If those formula- 
tions were taken literally, the theorems on probability would be state- 
ments of psychological laws; most of them would be obviously false just 
as are theorems of deductive logic interpreted as psychological laws, be- 
cause our beliefs are often influenced by irrational factors. Thus it is un- 
derstandable that many adherents of the classical conception seem not to 
feel quite satisfied with these formulations and use, either in addition or 
instead, those of qualified psychologism, for instance, ‘rational degree of 
belief’, and the like. As we have seen earlier, formulations of this kind may 
be regarded as a step toward the elimination of psychologism and are in- 
deed no longer subjectivistic because they presuppose—in most cases 
tacitly—objective standards. Therefore, the occurrence of these formula- 
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tions suggests that perhaps the use of primitive psychologistic formula- 
tions is likewise not a proof of a genuinely subjectivist conception but 
merely a customary, though not quite adequate, way of dealing with con- 
cepts which are meant as logical, not psychological. 

Jacob Bernoulli makes some general explanatory remarks about the na- 
ture and application of probability in the beginning of Part Four of his 
Ars conjectandi, a work that marks the beginning of the systematic study 
of probability. He declares that “probability is the degree of certainty 
and differs from it as a part from the whole” (p. 211). The highest cer- 
tainty is attributed by him to those things which we know by revelation, 
reasoning, or sensory perception; all other things have a less perfect meas- 
ure of certainty. All this has a psychologistic sound. It becomes, however, 
quite clear that Bernoulli’s theory of probability which he calls the art 
of conjecture (“ars conjectandi sive stochastice”, p. 213) is not meant 
as a description of actual processes of reasoning but rather as a guide to 
correct and useful reasoning. He defines this art as “the art of measuring 
the probabilities of things as exactly as possible, so that we can always se- 
lect and heed in our judgments and actions that which appears to us as 
better, more suitable, more certain or advisable” (p. 213). 

Similarly Laplace understands ‘probability’ not in a psychological, sub- 
jective sense but in an objective sense. This is clearly shown by some pas- 
sages near the end of his philosophical work ({Essai]; our quotations are 
from the edition of 1921). Here he says that the theory of probability 
makes exact what we feel by a kind of instinct; that it leaves nothing ar- 
bitrary in the choice of our opinions, since, with its help, the most ad- 
vantageous choice can be determined; further, that the theory guides our 
judgments and protects us from illusions (II, 105 f.). 

If the explicandum which the classical authors had in mind was not a 
subjective concept, which objective concept was it? The logical concept of 
probability, and the empirical concept of probability, are both objective. 
Tam inclined to assume that on most occasions, though perhaps with a few 
exceptions, they meant something like probability,, that is to say, not an 
empirical but a logical concept, which characterizes the strength given to 
a certain hypothesis by some amount of evidence. 

Laplace ([Essai], I, 7) discusses an example of three urns—A, B, C. We 
know that one of them contains only black balls, but we do not know 
which of the three it is; we know further that the two other urns contain 
only white balls. Laplace raises the question as to what is the probability 
that a ball which will be drawn from the urn C will turn out to be black. 

From our present point of view, the essential fact is that Laplace states 
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different values of the probability: first one on the basis of the knowledge 
mentioned; then another value which the probability takes on when we 
learn that the urn A contains only white balls; and, finally, a third value 
when we learn, in addition, that B likewise contains only white balls. This 
shows that Laplace is not speaking about probability, or any other 
physical property of the urns, because these properties do not change 
when we learn more about the urns. What he means must be something 
that is dependent upon the state of our knowledge; hence it seems likely 
that he means something like the weight of evidence that our knowledge 
gives to a certain hypothesis, in other words, something like probability,. 
The formulations by which the classical authors intend to explain what 
they mean by ‘probability’ vary a good deal, even with the same author, 
and are often not as clear as we might wish. Thus we must base our in- . 
terpretation also on the way in which they reason about probability in 
their theories. Often when we try to interpret an ambiguous term used 
by an author of another period, in another language or in an unfamiliar 
terminology, we proceed in the following way. Suppose the author in ques- 
tion is known for many valuable results he has found in the same or a 
related field; suppose further that he uses the term in question at certain 
places not in a casual way but in the formulation of theorems which are 
clearly important to him; suppose, finally, that among the meanings of the 
term which come into consideration there is one for which these theorems 
would hold, while they would be false for the other meanings. Then there 
is some reason to regard these facts as supporting the assumption that the 
meaning of the term which makes the theorems true is the one intended 
by the author. Certainly, this method must be used with caution; other- 
wise it would lead to rather arbitrary interpretations and, in the extreme, 
to the absurd result that all assertions of all authors seem to agree with 
our opinions. But as an auxiliary procedure, in combination with a con- 
sideration of the author’s own explanations of the term, it may sometimes 
be helpful. Let us apply this to our case. The classical theory of probability 
contains certain theorems of the following kind. If interpreted in the sense 
of probability, these theorems are obviously false (even after certain 
modifications which seem necessary for any interpretation, e.g., the addi- 
tion of a second argument of the probability function). Therefore the 
representatives of the frequency conception have rejected these theorems 
and have even expressed their amazement that any sensible man should 
assume such absurdities. These theorems are, of course, also false if in- 
terpreted in the sense of the psychological concept of degree of belief, as 
are practically all theorems. On the other hand, these theorems are true or 
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at least not quite implausible if interpreted in the sense of probability,. 
(Examples are certain specializations of the controversial principle of in- 
difference; this principle itself in the customary form, however, is too gen- 
eral and leads to contradictions.) It seems to me that this fact lends addi- 
tional support to our assumption that the explicandum which the classical 
authors had in mind during most of their discussions is probability, or 
something similar to it. I formulate this assumption with these cautious 
restrictions because it seems to me that there is no one meaning of the 
term ‘probability’ which is applied with perfect consistency throughout 
his work by any of the classical authors. There are some places where, I 
think, the interpretation as probability, makes no good sense while the 
interpretation as probability, does. (Examples are the references to “un- 
known probabilities”; see below, § 41D.) 

Our interpretation of the classical theory in terms of probability, is in agree- 
ment with the view of Jeffreys, who offers forceful arguments in favor of this 
interpretation as against one in terms of frequency; one strong argument is 
simply the characteristic title Ars conjectandi of Bernoulli’s book. Jeffreys 
comes to the following conclusion: “I maintain that the work of the pioneers 
[Bernoulli, Bayes, and Laplace] shows quite clearly that they were concerned 
with the construction of a consistent theory of reasonable degrees of belief, and 


in the cases of Bayes and Laplace, with the foundations of common sense or in- 
ductive inference” ((Probab.], p. 335). 


With respect to those later writers who follow the classical tradition 
the situation is quite similar. In spite of psychologistic formulations, it is 
usually quite clear that they have an objectivist conception. We may per- 
haps have some doubt in this respect in the case of De M organ because of 
his persistent formulations in terms of primitive psychologism. But even 
here we find that finally the author not only takes the saving step from 
primitive to qualified psychologism but regards this step merely as a 
transition from a natural, though not quite adequate, formulation to a 
more correct one rather than as a change in the conception itself: “ ‘It is 
more probable than improbable’ means . . . ‘I believe that it will happen 
more than I believe that it will not happen’. Or rather, ‘I ought to believe, 
etc.” ” ([Logic], pp. 172 f.). [Incidentally, a formulation like ‘It is more 
probable than improbable that it will rain’, used by some authors, seems 
a somewhat jumbled way of saying ‘It is more probable that it will rain 
than that it will not rain’; it is like saying: ‘I believe that it will rain more 
than I disbelieve that it will rain’] 

` It seems to me that, on the basis of the discussions of this section, it is 
plausible to assume that for most, perhaps for practically all, of those au- 
thors on probability who do not accept a frequency conception the follow- 
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ing holds. (i) Their theories of probability are objectivistic; the frequent 
formulations of psychologism, qualified or even primitive, are usually only 
přeliminary remarks not affecting their actual working method. (ii) The 
objective concept which they mean, clearly or vaguely, as their explican- 
dum is something similar to probability,; in the classical period the expli- 
candum is often not yet quite clear; but it seems that in the course of the 
historical development the concept of probability, emerges more and more 
clearly. 

It cannot, of course, be denied that there is also a subjective, psycho- 
logical concept for which the term ‘probability’ may be used and some- 
times is used. This is the concept of the degree of actual, as distinguished 
from rational, belief: ‘the person X at the time ¢ believes in / to the de- 
gree r’. This concept is of importance for the theory of human behavior, 
hence for psychology, sociology, economics, etc. But it cannot serve as a 
basis for inductive logic or a calculus of probability applicable as a general 
tool of science. 


CHAPTER III 


DEDUCTIVE LOGIC 


In this chapter (§§ 14-40) the language systems £ are constructed, to which 
our theory of inductive logic will later be applied; and as much of the deductive 
logic with respect to these language systems is outlined as is necessary as a basis 
for the later construction of inductive logic. 

The first part of this chapter (§§ 14-20) gives the semantical foundations of 
deductive logic. The knowledge of this part is presupposed already in the next 
chapter, while the study of the other parts may be postponed until their mate- 
rial is used in later chapters. The language systems £ are constructed as systems 
of semantical rules. There is one system lw with an infinite number of individu- 
als, and other systems 2y with a finite number N of individuals. The rules of 
formation determine the ways in which the signs of the systems Ì (§ 15) may 
be combined into sentences (§ 16). We use individual variables as the only 
variables (hence our systems correspond to what is known in symbolic logic 
as the lower functional logic). The rules of truth give sufficient and necessary 
conditions for the truth of the sentences (§ 17). Certain sentences which com- 
pletely describe all individuals with respect to all properties and relations ex- 
pressible in the system are called state-descriptions (3) (D18-1); they repre- 
sent all possible states of affairs for the whole domain of individuals. The rules 
of ranges determine for every sentence i in which of the state-descriptions it 
holds (D18-4); the class of these state-descriptions is called the range of i 
(R:, D18-6a), In this way the rules give an interpretation of the language sys- 
tem, i.e., they determine the meaning of every sentence; for to know the mean- 
ing of a sentence is to know in which of all possible cases it would be true. The 
rules, by determining the ranges, serve also as a basis for what we call the 
L-concepts (§ 20). For instance, a sentence is said to be L-true (logically true, 
analytic) if it holds in all possible cases, hence if its range comprises all state- 
descriptions (D20-1a) ; other L-concepts, e.g., L-falsity, L-implication, L-equiv- 
alence, are likewise defined on the basis of the concept of range (D20-1). De- 
ductive logic may be regarded as the theory of the L-concepts; hence, in our 
method, it is based on the concept of range. In a later chapter we shall define 
functions representing the degree of confirmation likewise with the help of the 
cane of range; thus, inductive logic will likewise be based on the concept 
of range. 

The second part of this chapter (§§ 21-24) lists theorems of deductive logic 
for later reference, most of them well known. They deal with the connectives of 
propositional logic (§ 21), general sentences (§ 22), replacements (§ 23), and 
identity (§ 24). 

The third and largest part (§§ 25-38) deals with special topics of deductive 
logic, selected because of their importance for inductive logic. Concepts applied 
to predicates, or to the properties and relations designated by them, are defined 
($ 25). Isomorphism of sentences is defined (D26-3a). This concept, especially 
in its application to state-descriptions, will later be of great importance in in- 
ductive logic. If two state-descriptions are isomorphic (§ 27), they may be said 
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to attribute to the realm of individuals the same structure. Certain sentences, 
which describe the possible structures, are called structure-descriptions (Str, 
D27-1). The most important special kind of our language systems £ comprises 
those whose primitive predicates designate only properties, not relations; 
they are called the systems £" (§ 31). These systems are dealt with in detail 
(§§ 31-38). Predicates of a special kind, ‘Q,’, ‘Qx, etc., are introduced (§ 31). 
The Q-properties designated by these Q-predicales are the strongest properties, 
expressible in the system. If a state-description is given, then we call the cardi- ` 
nal numbers of the Q-properties the Q-numbers of that state-description (§ 34). 
Isomorphic state-descriptions have the same Q-numbers, and any structure is ' 
completely characterized by its Q-numbers. In a later chapter the Q-numbers 
will be used for determining the degree of confirmation. Some deductive prop- 
erties of universal laws are discussed (§ 37), in particular-of laws of conditional 


form (§ 38). 
In the last section of this chapter (§ 40) some mathematical definitions and 
theorems are listed for reference in this and later chapters. 


§ 14. Preliminary Explanations 


The importance of an exact description of the object languages for induc- 
tive logic is emphasized. As metalanguage, English will be used, supplemented 
by German letters and other special signs. ` 


The present chapter does not deal with probability or inductive logic 
but supplies the necessary foundations for our later discussions of these 
topics. Here, we shall describe certain language systems {, and we shall 
outline a deductive logic for these systems. In later chapters, possibilities 
of inductive logic, that is, a theory of probability, (degree of confirma- 
tion), will be discussed in application to these language systems £ and 
based upon the deductive logic to be outlined here. 

This chapter consists of three parts, only the first of which is presup- 
posed in the next chapter. (i) The first sections (§§ 14-20) describe the 
systems £ and explain some semantical concepts in application to these 
systems. These concepts will be used continually in the later chapters. 
Therefore it seems advisable for the reader to become acquainted with 
them and their chief characteristics; but it is not necessary to study now 
all theorems given for them; the most important definitions and theo- 
rems are marked by ‘+’. (ii) Some subsequent sections (§§ 21-24) con- 
tain chiefly well-known material of deductive logic. They are written in 
the first place for purposes of later reference, not so much for reading. 
Of the same nature is the last section in this chapter (§ 40); it lists some 
mathematical definitions and theorems. (iii) The remaining sections 
(§§ 25-38) deal with special topics in deductive logic which are needed 
for certain later chapters. The reader who is impatient to come to induc- 
tive logic as soon as possible may skip them at present, to return to the one 
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or the other of them only later when the need arises and indications are 
given. (The material of §§ 25-27 will be needed in chap. viii; that of §§ 31 
and 32 in § 107.) 

When we come to inductive logic, we shall see that it is even more nec- 
essary there than in deductive logic to describe the whole structure of the 
language to which it is to be applied; that is to say, the value of the de- 
gree of confirmation for two given sentences is dependent not only upon 
the two sentences but also upon the particular features of the language to 
which the sentences belong. Although many contemporary authors have 
used symbolic logic in their discussions on probability, (for instance, 
Keynes, Jeffreys, Mazurkiewicz, Hosiasson), none of them, let alone 
earlier authors, has paid sufficient attention to the language structure. 
In.my view, this is a serious defect of most theories from the classical pe- 
riod up to our time; it is responsible for certain difficulties and even for 
contradictions resulting from certain principles in their customary form. 
Therefore it is essential that we specify our language systems in detail 
before applying inductive logic to them. 

For the language systems to be constructed here we choose a relatively 
simple structure, with individual variables as the only variables. [This 
structure corresponds approximately to what is known as the lower func- 
tional logic with only individual variables, or (in the terminology of 
Alonzo Church, [Dictionary], p. 174) a simple applied functional calculus 
of first order.] The actual language of science and even that of elementary 
physics has, of course, a much more complex structure; space-time points 
are represented by their coordinates, and hence real number variables are 
required; events are described in a quantitative way, with the help of 
physical functions with numerical values. However, it seems advisable 
not to try the construction of an inductive logic immediately for a lan- 
guage of this complex form but to begin with simpler structures. Deduc- 
tive logic, which is more than two thousand years older than inductive 
logic, was likewise first applied to simple language forms. Aristotle’s logic, 
the traditional logic based upon it, and even the first systems which used 
the exact, symbolic methods of modern logic (constructed by Boole and 
his followers) deal only with sentence forms which constitute a small frac- 
tion of those in the systems to be here constructed. Frege was the first (in 
Begriffsschrift [1879]) to construct a system of deductive logic for a lan- 
guage form which reaches the complexity of the one we shall use here and 
even goes far beyond it. Later, some indications will be made concerning 
possible ways for solving the problems of extending our system of induc- 
tive logic to more comprehensive languages. These problems concern 
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especially languages containing a basic order of the individuals (§ 15) 
and those containing quantitative physical concepts. à 

Since we intend to construct inductive logic as a theory of degree of 
confirmatioh, based upon the meanings of the sentences involved—in 
contradistinction to a mere calculus—we shall construct the language 
systems £ with an interpretation, hence as systems of semantical rules, not 
as uninterpreted syntactical systems. The systems &, our object languages, 
are symbolic systems, containing customary symbols of symbolic logic 
and some letters as nonlogical constants. This book presupposes some 
knowledge of the simplest elements of symbolic logic; but it does not pre- 
suppose acquaintance with the semantical method as developed in [Se- 
mantics]. The semantical concepts here used will be explained to the 
extent necessary for the purposes of this book. 


Those readers who wish to obtain a fuller understanding of the semantical 
method, and especially the semantical L-concepts, may be referred to the more 
detailed discussions in [Semantics] and [Meaning]. The broader field of semiotic, 
the general theory of signs, of which semantics forms a part, is briefly sketched 
in Charles Morris’ Foundations of the theory of signs (= “Encyclopedia of uni- 
fied science,” Vol. I, No. 2 [1938]), and surveyed in greater detail in his Signs, 
language, and behavior (1946). 

As metalanguage in which we describe the systems £ and formulate 
theorems of deductive logic concerning these systems and later the theo- 
rems of inductive logic, we use the English language supplemented by 
some technical signs, especially German letters, as follows: 

“iw refers to the individual constants (of the systems £ in general or of 

the one under discussion), 

‘? to the individual variables, 

‘pr’ to the primitive predicates, 

‘9’ refers to any expressions (that is, single signs or finite sequences of 

signs), 

‘© to the sentences, 

‘WP to (sentential) matrices (that is, either sentences or expressions 

of analogous forms but with free variables, e.g., ‘Px’), 

‘Q to classes of sentences (sometimes also to classes of other expres- 

sions); 
further (see later explanations): ‘3’ refers to state-descriptions, ‘9’ to 
ranges (§ 18), and ‘Gtr’ to structure-descriptions (§ 27). 

We adopt the term ‘matrix’ from Quine, because the more customary terms 
‘propositional function’ or ‘sentential function’ are misleading (see [Semantics], 
pp. 232 f.). Following the usage of most mathematicians, the term ‘function’ is 
applied in this book only to certain concepts (e.g., ‘c-functions’), but not to 
linguistic expressions. 
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These German letters will be used in two ways. (i) A German letter 
without subscript will sometimes be used as a convenient abbreviation for 
the corresponding English noun or phrase (or its plural form); for ex- 
ample, we shall sometimes write ‘all 3 are...’ as short for ‘all state- 
descriptions are . . .’, ‘this system contains three pr’ for ‘. . . three primi- 
tive predicates’, ‘this sentence contains no in’ for ‘. . . no individual con- 
stant’, etc. (ii) A German letter with one of the subscripts ‘i’, ‘f’, etc., 
serves as a variable of the metalanguage for reference to the kind of signs 
or expressions of the systems indicated above. (Less frequently, a Ger- 
man letter with one of the subscripts ‘1’, ‘2’, etc., is used as a constant of 
the same kind.) For instance, a formulation like: ‘If ©; L-implies S;, then 
the negation of ©; L-implies the negation of ©; is to be understood as 
saying: ‘Tf a first sentence L-implies a second sentence (which is not neces- 
sarily different from the first), then the negation of the second L-implies 
the negation of the first”. Since the arguments of degree of confirmation 
are sentences, our discussions and theorems will contain very many ref- 
erences to sentences; therefore it is convenient to have simpler signs for 
these references. For this reason, we shall, instead of S, ‘S,’, Sr, ‘Sr, 
usually write simply ‘i’, ‘7’, ‘k’, ‘’; ‘e’ and ‘h’ are used in the same way. 
Note that these letters, although italics, belong, not to the symbolic 
systems £, but to the metalanguage, that is, they are used, like German 
letters, in the English context. 

Some other German letters are used in the metalanguage, not as desig- 
nations for expressions of the object languages, but for certain semantical 
concepts of inductive logic; these are chiefly the functors ‘m’ (measure 
function, § 55A) and ‘c’ (degree of confirmation, § 55A); furthermore, in 
some special chapters, the functors ‘r’ (relevance measure, § 67) and ‘e’ 
(estimate, § 99) and the predicates ‘NG’ (the comparative concept of 
confirmation, § 79), ‘©’ (the classificatory concept of confirmation, § 86), 
and others. ‘” is used for the semantical systems to be constructed here. 

In connection with the use of German letters, we lay down two con- 
ventions. The first one is customary. 


Convention 14-1. A name in the metalanguage for a compound expres- 
sion ‘of the object language is formed by simple juxtaposition of the 
names (or variables) for the signs of which the compound expression 
consists. 


For example, if ‘pr,’ refers to ‘R’, ‘in,’ to ‘a’, and ‘in,’ to ‘0’, then 
‘pryinjin,’ refers to ‘Rab’. Further, for the sake of simplifying the symbolic 
notation in the metalanguage, we shall permit the use of symbols of the 
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object languages as names of themselves, provided the occurrence of a 
symbol of the metalanguage, e.g., a German letter, makes it clear that the 
whole expression belongs to the metalanguage. Because of this restricting 
condition, no ambiguity can arise. Hence we lay down the following con- 
vention: \ 


Convention 14-2. If a compound symbolic expression contains a Ger- 
man letter (or one of the letters ‘e’, ‘h’,. . . , ‘P, which are equivalent to 
German letters) or ‘P (see below), then the whole expression is to be un- 
derstood as an expression of the metalanguage, and any symbol of the 
object language occurring in it is to be understood as a name of itself, 
that is, as if it were included in quotation marks or replaced by a corre- 
sponding German letter. - 

We shall, however, make use of the notation allowed by this convention 
only in the following two ways: 

(a) In order to form the name of an expression (according to Convention 
1), we very often take as names of symbols of the object language 
(usually nonletter symbols or ‘?) these symbols themselves, as is cus- 
tomary. (For example, if ‘ refers to ‘Pa’, and ‘f’ to ‘Qb’, then (~i) Vj’ 
refers to ‘(~Pa)VQb’; hence the connectives and parentheses are used 
here as names of themselves.) 

(b) We write occasionally (not frequently) an expression of the object 
language instead of its name when it occurs as an argument expression 
following either a German letter functor (e.g., ‘m, ‘c’) or a German let- 
ter predicate (e.g., ‘MC’) of the metalanguage. (For example, we might 
write ‘c(Pb, e)’ instead of ‘c(‘P2’, e)’ or ‘c(pr:ina, e) ) 

Some other special signs are used in the metalanguage in combination 
with German letters. We take ‘}’ for ‘L-true’ (to be explained later, 
§ 20); thus we write ‘Hi’ as short for % is L-true (in the system in ques- 
tion)’. Hence (according to later explanations) ‘+77’ means the same 
as “i L-implies j (in the system in question)’ (thus the earlier example 
will be written like this: ‘If ¿Dj then} ~j D ~ 7’). The following signs 
are used in combinations with class expressions in the metalanguage. 
‘QC R is written as short for ‘®; is a subclass of &,’; Bo ee PE o E 
belongs to (is an element of) 87; ‘Rigy for ‘the class-sum of &; and 
R; RoR; for ‘the class-product of &; and &,; ‘— 8,’ for ‘the com- 
plement-class of &? (i.e., ‘the class of all sentences not belonging to &;’); 
‘R — R? for ‘R;(—8,)’. ‘fi designates the class whose only ele- 
ment isi; ‘ffr jas - - . » Ja P the class whose only elements BCG Jape ean 
‘=p? is used as sign of definition in the metalanguage. ‘( )(Mtx)’ is short 
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for ‘(ir:) (ixa) - . . (ixn) (M), where irr, ies, ..., ten are the variables oc- 
curing freely in M, in the order of increasing subscripts. 


This book makes use of symbolic logic and presupposes some elementary 
knowledge in this field. All the symbols used will be explained in the next sec- 
tion. Elementary introductions to symbolic logic: Alfred Tarski, Introduction 
to logic (New York, 1941), John Cooley, A primer of formal logic (New York, 
1942), Hans Reichenbach, Elements of symbolic logic (New York, 1947). Sys- 
tematic works on a higher technical level: Whitehead and Russell (Princ. 
Math.], which is the great standard work in the field, and Quine (Math. Logic], 
which constructs a system of a new form. 


§ 15. The Signs of the Systems £ 


A. The infinite system lo contains an infinite sequence of individual con- 
stants (in): ‘a,’, ‘a,’, etc. Any finite system Qy contains only the first V of them. 
All other signs are the same in all systems. There is a finite number of primitive 
predicates (pr) of any degrees. There is an infinite sequence of individual 
variables (i): ‘x,’, ‘x,’, etc.; they are the only variables. There are universal 
quantifiers, and the customary symbols for identity, negation, disjunction, and 
conjunction. The customary symbols for existence, conditional, biconditional, 
and nonidentity are introduced as unofficial abbreviations (A1). B. Some indi- 
cations are made concerning possibilities for the construction of a more compre- 
hensive language syStem describing a basic order of the individuals and for a 
method of inductive logic suitable to that system. 


A. The Signs Occuring in Our Systems 


Our language systems £ comprise one infinite system Q» and the finite 
systems Yy; the latter form an infinite sequence of systems with N run- 
ning through all positive integers: &,, &., R, etc. 

The system {= contains an infinite sequence of individual constants 
(in): ‘ay’, ‘a,’, ‘ay, etc. (in examples we shall sometimes use ‘a’, ‘b’, ‘c’, 
etc.); they refer to all the individuals in the domain of individuals (uni- 
verse of discourse) of lo. These individuals may be things, events, posi- 
tions, or the like. Further, l» contains a finite number of primitive 
predicates (pr) of any degree (i.e., number of arguments). Those of de- 
gree one, for example, ‘P,’, ‘P,’, etc., designate properties of individuals; 
those of degree two, for example, ‘Ry’, ‘R,’, etc., designate dyadic rela- 
tions between individuals; and so on. Properties and relations together 
will be called attributes. We do not specify the number of pr before- 
hand; sometimes we shall do so in order to make the description of 22 and 
the other systems more specific. Further, we do not lay down an interpre- 
tation for the pr or the in because the choice of a particular interpreta- 
tion is irrelevant for both deductive and inductive logic. Thus, what we 
shall actually construct is, strictly speaking, not a semantical system but, 
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so to speak, a skeleton of a semantical system. We assume that for any 
concrete application of deductive or inductive logic the systems are sup- 
plemented in the following way: (i) a finite number of pr is chosen and 
their degrees are specified; (ii) an interpretation for these pr is given by 
rules of designation, that is, semantical rules of a form like this: ‘pr, 
designates the property Blue’; (iii) the in are interpreted by a general 
rule of designation of the following form: ‘With respect to such and such 
an infinite sequence of entities, the nth individual constant (i.e., ‘an’) 
designates the nth entity in the sequence’; this rule is assumed to be such 
that we can see from it alone without the use of factual knowledge that 
any two different individual constants designate different entities. (The 
interpretations of the pr and the in must fulfil a requirement of logical 
independence to be explained later, § 18B.) We shall speak of a class of 
individuals usually only in case the individuals are given by an enumera- 
tion with the help of individual constants, but not in case they are char- 
acterized by a common property. Thus, for example, we shall say ‘the 
class of the individuals a, b, c’ or ‘the class of the individuals referred to 
in the sentence e’ (meaning ‘the class of the individuals whose in occur 
in e’); but we shall not say ‘the class of those individuals which are P? 
or ‘the class Blue’, but rather ‘the property (of being) P,’ or ‘the property 
Blue’. 

If we were only concerned with deductive logic, there would not be 
much reason to construct finite systems in addition to te. However, we 
shall see that the construction of an inductive logic is made technically 
simpler if we apply it, not immediately to læ, but first to finite systems 
and then with their help to le. &y contains only the W first individual 
constants of læ ; hence £, contains only ‘ar; &, contains ‘a,’ and ‘az; etc. 
The in in Qy designate the same individuals as in e; thus the individual 
domain of each finite system is a part of that of læ. It is clear that every 
individual constant of £e occurs also in some finite systems, and indeed 
in infinitely many such systems; for a given m, ‘a,’ occurs in every ty 
with N = n. Since all the other signs are the same in all systems, every 
sentence of fo occurs in infinitely many finite systems; if ‘a,’ is the 
individual constant with the highest subscript occurring in the sentence ¢ 
of Qa, then ż occurs also in every fy with N = n. 

Every system £, finite or infinite, contains an infinite number of indi- 
vidual variables (i): ‘£x, ‘£2, ‘x3, etc. (or ‘£’, ‘9’, ‘2’, etc.). The individual 
constants and individual variables are together called individual signs. 
The values of these variables in a given system are the individuals of that 
system, that is, the individuals designated by the in of that system. 
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Every system contains universal quantifiers with individual variables; 
‘(x)(Px)’ means ‘for every individual x (of the domain of individuals of 
the system in question), x is P’. The existential quantifiers (e.g., ‘(Ax)’, 
‘there is an individual x (of the system in question)’) do not occur in 
the systems themselves; but we shall introduce them by the customary 
definition for the purpose of convenient, inofficial abbreviations (Arc). 
According to the given explanation, the sentence ‘(x)(Px)’ in Qy means 
the same as the conjunction ‘Pa, . Pa, . . . . . Pay’ with N components; 
and we shall lay down the semantical rules in such a way that these two 
sentences are L-equivalent. Thus, in different finite systems, the universal 
sentence ‘(x)(Px)’ has different meanings. And in 2m the same sentence 
has again a different meaning because it says something about the in- 
finitely many individuals. In Q» the scope ‘Px’ has an infinite number of 
instances ‘Pa,’, ‘Pa,’, etc.; therefore we cannot form a conjunction out of 
them; but the universal sentence is L-equivalent to the infinite class of 
these instances. 

It is important to realize clearly the fact that the same sentence may 
have different meanings in different systems and hence also different 
properties both in deductive and in inductive logic. Perhaps a reader 
might think that, although we have the same string of marks in different 
systems, we cannot properly speak here of the same sentence if the mean- 
ings are different. However, we have decided to understand by the term 
‘sentence’ just the string of marks (more exactly speaking, a sentence- 
design is a finite sequence of sign-designs, see [Semantics] §§ 2, 3). If 
somebody prefers to use the phrase ‘the same sentence’ only if both the 
signs and the meanings are identical, there is no objection; however, in 
this case we should have to look for another term to take the place of our 
term ‘sentence’. The relationship between ‘(x)(Px)’ as an item occur- 
ring in the system &,, and the same string of marks in &,, and the same 
in le, is not a mere typographical accident (as is the case, for example, 
with ‘~’ in Russell’s and in Hilbert’s notations, where there is no con- 
nection of meanings). The meanings, although different, stand in a close 
relationship to each other. The meanings of ‘(x)(Px)’ in the systems fy 
with increasing NV converge, so to speak, toward its meaning in 2a; this 
fact will later be of great importance in inductive logic for the definition 
of degree of confirmation with respect to sentences in lo (§ 56). For this 
Teason, our way of speaking of ‘the same sentence’ will be very con- 
venient. 

The sentences in our systems contain no free variables, These systems 
contain only individual variables, no attribute variables. (This is that 
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form of the lower functional logic which has been applied more frequently 
in recent years and has been shown to be a good working basis for logic.) 

All signs except the individual constants occur in all systems £ alike. 
All signs except the individual variables have the same meanings wher- 
ever they occur; hence every sentence without variables has the same 
meaning in all systems in which it occurs. Thus the following explanations 
of the remaining signs apply to all systems. ‘ 

The systems contain ‘=’ as the customary sign of identity for indi- 
viduals; this sign is not regarded as a predicate and hence not counted 
among the primitive predicates (pr). As mentioned earlier, it is presup- 
posed that different individual constants designate different individuals; 
that is to say, we shall construct the semantical rules in such a way that 
a full sentence of ‘=’ with two different in (e.g., ‘a: = @;’) becomes L- 
false (for explanation of the terms with ‘L-’, see § 20). Further, we shall 
of course make the rules such that an =-sentence with two occurrences 
of the same in will be L-true. Hence all =-sentences will be L-determi- 
nate. ‘=’ will be defined as sign for nonidentity (Ard). 

The systems £ contain ‘f’ as a tautological sentence. It would, of course, 
be possible to define ‘# as abbreviation for some tautological sentence 
(e.g., for ‘P,a,V ~P,a,’). However, we prefer to take it as a primitive 
sign belonging to the systems themselves; this seems convenient for the 
construction of normal forms. 

Of the customary connectives, only the signs of negation (‘~’, mean- 
ing ‘not’), disjunction (‘V’, ‘or’ in the nonexclusive sense), and conjunc- 
tion (‘, ‘and’) occur in the systems themselves. The signs of the condi- 
tional (‘D’, meaning ‘if—then’) and of the biconditional (‘=’, ‘if and 
only if’) will be introduced by their customary definitions (Aza and b). 
Following Quine, we regard these and all other defined signs not as be- 
longing to the systems themselves; an expression containing a defined sign 
serves, so to speak, as shorthand for the corresponding expanded expres- 
sion in primitive notation. (Concerning a later deviation from this pro- 
cedure see remark preceding D33-1.) . 

The definitions introducing abbreviations for expressions of the object 
languages, i.e., the systems l, are marked by ‘A’; the much more frequent 
definitions introducing words, phrases, or signs (e.g., German letters) into 
the metalanguage are marked by ‘D’, theorems by ‘T’, etc.; each of these 
letters ‘A’, ‘D’, ‘T’, etc., is followed by two numerals (e.g., ‘Ar5-1’); the 
first gives the number of the section, the second that of the particular 
item. For references within the same section the first numeral is omitted 
(for example, a reference ‘Aza’ in this section refers to A15-1a). The 
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more important definitions, theorems, etc., are marked by ‘+’ (e.g. 
Dı8-1). 

A16-1. Expressions containing the signs ‘D’, ‘=’, ‘P’, and ‘>’ will be 
used as unofficial abbreviations in the following way. 

a. MD M; for (~M,) VD. 

b. M: = M; for (WDM) . (MD M). 

c. (Ti) (M;) for ~(i) (~M;). 

d. A; = A; for ~(A; = A). 


For example, according to Ara, we shall write a partial expression 
‘Px P,y’ within a sentence as shorthand for ‘(~P,«)VP.y’. 


B. On the Possibility of an Ordered System 


Some brief remarks may be made concerning possibilities of a future 
development. If inductive logic is to be extended so as to apply to language 
systems more comprehensive than our system £ (for infinitely many in- 
dividuals, that is, æ), it might be useful to construct a stronger system {’ 
possessing the following features. First, & refers to a universe whose in- 
dividuals exhibit a fixed basic order of the structure of a progression (i.e., 
a linear, discrete order with one initial and no terminal member). For this 
purpose & contains a symbol (functor) for the concept of immediate suc- 
cessor in the basic order (for instance, ‘a” is written for ‘the successor of 
a’). This order may be interpreted as a kind of temporal order of events, 
and hence the individuals as temporal positions (in this simplified uni- 
verse there is only one event at any time-point). Furthermore, it seems 
desirable to have in & variables and constants for natural numbers; 
arithmetical functions (e.g., sum, product, etc.) can then be introduced 
by recursive definitions; thus the arithmetic of natural numbers can be 
formulated in X. The frequency of a property within a given class of in- 
dividuals can then be expressed in a simple way. A system & possessing 
the features described can be constructed by the following convenient 
and simple procedure without the need for a second kind of variable. In- 
stead of ‘a’, ‘a”, ‘a’”, etc., we write ‘o’, “o”, “o”, etc. These expressions 
are primarily interpreted as expressions for the natural numbers o; 1, 2, 
etc. To each position in the basic linear order a natural number is as- 
signed as its coordinate: the number o to the initial position, the num- 
ber 1 to the next following position, etc. An atomic sentence, say, ‘P(o’)’, 
may then say, for example, that the position with the coordinate 1 is 
blue. Strictly speaking, ‘P’ stands here for ‘the position with the co- 
ordinate . . . is blue’ and ʻo” stands merely for ‘(the number) 1’. But it 
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will then be convenient to allow, for practical purposes, a slightly changed ` 
interpretation to the effect that ‘P’ stands for ‘. ... . is blue’, and ʻo” for 
‘the position with the coordinate 1’. Thus the individual expressions, 
which were primarily interpreted as expressions for numbers, are regarded 
in the secondary interpretation as expressions for positions. This involves 
no actual ambiguity, because theoretically the primary interpretation is 
the only one; the secondary interpretation constitutes merely a convenient 
mode of speech in the metalanguage. In accordance with the secondary 
interpretation, we may then allow ourselves to say that the positions are 
the individuals of the universe of this system. The essential point is that in 
this system the positions as individuals are referred to not by names (like 
‘a’, ‘b’, etc.) but by coordinate expressions. (For a description of a co- 
ordinate language of the structure here indicated see [Syntax] § 3; for a 
general discussion of the semantical character of coordinate languages see 
[Meaning], chap. ii.) Consequently, the individual variables ‘x’, ‘y’, etc., 
which in & likewise are the only variables, are interpreted primarily-as 
variables for the natural numbers and secondarily as variables for the posi- 
tions. The extension of £ to &’ may appear to be only slight, but in fact 
the logical character of the new system is quite different from that 
of &, even in deductive logic. [For instance, since % contains arithmetic, 
according to Goedel’s result it is impossible to construct one calculus in 
which all L-true sentences of X are provable.] For the extension of induc- 
tive logic to the new system & there are two possible forms which could 
be constructed in two successive steps; we call them forms I and II. 


Form I of inductive logic. The old definitions of degree of confirmation 
constructed in this book for the system & (in particular, the definitions of 
regular m- and c-functions (§§ 55, 56), symmetrical m- and c-functions 
(§§ 90, 91), and c* (§ 110A)) are simply transferred to the system {’. 
[This is possible because the state-descriptions and therefore also the rules 
of ranges (§ 18D) remain in X essentially the same.] The main task is 
merely to develop, on the basis of those definitions, theorems of induc- 
tive logic which cover also the new sentences of ’. It seems that this can 
be done without great difficulties. The structure of form I of inductive 
logic as just described for X is fundamentally the same as that developed 
in this book for £. Although the basic order of the individuals is expres- 
sible in X, it is not regarded, in form I, as influencing the degree of con- 
firmation c. Suppose that the evidence e says that of three observed in- 
dividuals two had the property M and one ~M, and the hypothesis 4 
says that a certain unobserved individual is M. Then, in form I, the value 
of c(h, e) is the same no matter whether the individual with ~M is the 
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first, the second, or the third among the three observed individuals in 
their basic order and how the unobserved individual is located in re- 
lation to them. To disregard thus the temporal order of the events is cus- 
tomary in the traditional theory of probability and even in most parts 
of modern mathematical statistics, although in everyday life and in science 
regular temporal patterns which we have observed among past events 
often have a decisive influence upon our expectations for the future. 


Form IT of inductive logic. The second step consists in constructing new 
definitions for the concepts of degree of confirmation so as to take into 
account not only the observed or expected frequencies of the properties 
in question but also the order in which these properties occur. 

In modern mathematical statistics, in distinction to the traditional the- 
ory of probability, temporal sequences are studied (e.g., in the analysis 
of time-series (see Wold [Time Series] and Kendall [Statistics], Vol. II, 
chaps. 29 and 30) and the sequential analysis (see Wald [Sequential])). 
However, these investigations do not show a way for constructing an ad- 
equate explicatum c of the kind described. A preliminary study which I 
have made seems to show that it is not too difficult to construct a fairly 
adequate definition. However, the application of this definition to cases 
which involve many individuals, and the development of general theo- 
rems based on the definition seem to become rather complicated. Here 
arise new and very interesting problems; it remains for future investiga- 
tions to discover whether satisfactory solutions can be found. 

If we wish to express the temporal order of events and have it influence 
the degree of confirmation as in the form II just explained, there is an 
alternative method which can even be applied in our present system & on 
the basis of the present definitions of degree of confirmation. This method 
consists in designating the relation of temporal priority by a primitive 
predicate (whereupon immediate priority can be defined). If this method 
is chosen, then the order of events is taken into account even by the form 
of inductive logic developed in this book. Here, the temporal order (‘x is 
earlier than y’) is not expressed as a basic positional relation but in analogy 
to an empirical, qualitative relation (e.g., ‘x is warmer than y’). Conse- 
quently, such fundamental characters of the temporal order as asym- 
metry and transitivity are represented in this method as contingent fea- 
tures. As an example consider the following prediction concerning balls 
drawn from an urn: ‘the first red ball which will appear will come earlier 
than the first blue ball; and this blue ball will appear earlier than the red 
one’. This hypothesis must be regarded, on the basis of the method under 
discussion, not as impossible, but at worst as improbable. Its degree of 
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confirmation on any finite evidence will not be o but will have a positive 
value. It seems, however, rather doubtful whether this sentence could be 
regarded as expressing a possible outcome of observations. For this and 
other reasons I have some doubt concerning the adequacy of this method. 
I believe that the temporal order and, more generally, the spatiotemporal 
order is to be regarded as a basic positional order rather than a qualitative 
order; in other words, that it is more adequate to represent the spatio- 
temporal order by the form of the individual expressions (coordinate ex- 
pressions) rather than by primitive predicates. At any rate, further clari- 
fication of this problem is required. At the present time not even the na- 
ture of the problem itself is clear. Should it be regarded as a question con- 
cerning the “true nature” of space and time, to be answered by ontological 
or phenomenological methods? I think a more fruitful approach would 
be to construct language systems of both forms—the first expressing 
spatiotemporal relations by primitive predicates, the second by the form 
of coordinate expressions for the positions—and develop inductive logic 
for both of them. Preference will then be given to that language system 
for which a more adequate or more convenient inductive method can be 
developed. 


§ 16. The Rules of Formation 
Rules of formation are laid down. They determine the customary forms of 
(sentential) matrices, which include sentences (D2). A sentence is defined as a 
matrix without free variables (D4). Some kinds of matrices (D3) and sentences 
(D6) are defined: atomic, basic (atomic or negation), molecular (without quan- 
tifier or sign of identity), general (with quantifier). 
On the basis of the informal explanations in the preceding section, we 
shall now begin the construction of the systems by laying down their 
semantical rules. In this section we give the first kind of these rules, the 
rules of formation; they state, in the form of definitions, which kinds of 
signs belong to the systems £ and how sentences are formed out of these 
signs, 
D16-1. A; is a sign in & = ps A; belongs to one of the following kinds: 
a. Individual constants (in). In lœ, an infinite number: ‘ar, ‘a,’, ‘ay, 
etc. (Instead of ‘a,’,..., ‘ds’, We write sometimes ‘a’, ‘b’, ‘c’, ‘d’, 
ʻe.) In Qy, a finite number N: ‘ay’, Stet tig AON + 

b. A finite number of primitive predicates (pr) of any degrees: ‘P,’, 
‘P,’, etc.; ‘Ry’, etc. 4 

c. An infinite number of individual variables (i): ‘£x’, ‘xq’, etc. (Instead 
of ‘x,’, ‘as’, ‘x’, we write sometimes 3, ‘y’, P.) 

d. Seven single signs: ‘~’, V’, ‘, ‘=’, P, ‘C, V- 
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An expression in Ì is a finite sequence of signs in &. 


D16-2. A; is a (sentential) matrix (M) in £ = pr A; consists of signs of Q 
and has one of the following forms. 

a. pril;Mj. ... Un, where pr; is of degree n and each of the n 

argument expressions is an individual sign, i.e., an in or an i. 
b. Ax = Yj, where Y, and Y; are individual signs. 
Conc 
d. ~(M). 
e. (MIVD). 
f. (M) . (M). 
g (i) (M). 

In the actual writing of symbolic formulas or of their descriptions in the 
metalanguage, we shall usually omit the parentheses including a com- 
ponent in the forms D2d, e, f or a scope in the form Dag under the cus- 
tomary conditions: we take the universal and existential quantifiers and 
‘~’ as of greatest strength (hence, if one of these is not followed by an 
expression included in parentheses, its scope is the smallest matrix imme- 
diately following), then come ‘V’ and ‘.’, and finally ‘D’ and ‘=’, (Thus, 
for example, ‘~iVP,a)(z)P..P,b’ is short for ‘[(~#VP,a] > 
((«)(P.x). P,b)’.) Further, we shall speak in the customary way of 
disjunctions and conjunctions with # components for any n = 1 (for 
7 = 1, the sentence or matrix itself is its only disjunctive or conjunctive 
component; for example, if we say: ‘let 7 be the disjunction of those sen- 
tences which fulfil such and such a condition’ and it turns out that only j 
fulfils that condition, then this is meant to say that 7 is the sentence 7 
itself). 

In D3 some particular kinds of matric®s are defined. 

D16-8. Let M; be a matrix in Q. 

a. M: is a matrix of degree n (for any n = 0) =p; Mi contains n dif- 

ferent free variables. 

b. M; is an atomic matrix =p; M; has the form Dza. (Note that the 

forms D2b and c are not counted as atomic.) 

c. Mi; is a basic matrix =p; M; is either an atomic matrix or the nega- 

tion of one. 

d. M; is an identity matrix (=-matrix, =-M) =p; M; has the form D2b. 

e. Mi; is a molecular matrix =p; M; is either atomic or constructed out 

of one or more atomic matrices with the help of connectives. (Hence, 
quantifiers, ‘=’, and ‘’ do not occur in W.) 
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f. Dt, is a general matrix =p; Mt; contains at least one quantifier. 

g. M: is a nongeneral matrix =p; M; contains no quantifier. (Hence, 
M; has either one of the forms Daa, b, c or is constructed out of 
these forms with the help of connectives.) 

h. Mj is a purely general matrix = p: M; is general and contains no in. + 


D16-4, MN; is a sentence (©) in l = ps A; is a matrix of degree o (that is, 
without free variables). 


D16-5. i is an instance of Dy (in £) =pr é is a sentence (in £) con- 
structed out of Dy by the substitution of individual constants (of £) for 
all free variables. 


D6 defines some particular kinds of sentence in analogy to D3. 


D16-6. Let i be a sentence in $. 

a. iis an atomic sentence =p, i has the form Daa. 

b. i is a basic sentence =p i is an atomic sentence or the negation 
of one. 

c. R: is a basic pair =p; Ñ; is a class of two sentences, one being an 
atomic sentence and the other its negation. 

d. iis an identity sentence =p: i has the form Dab. 

e. i is a molecular sentence = pri is either atomic or constructed cut of 
one or more atomic sentences with the help of connectives. (Hence, 
variables, ‘=’, and ‘? do not occur in i.) 

f. iisa general sentence =p; i contains at least one variable, and hence 


a quantifier. 
g. i is a nongeneral sentence =n; Ì contains no quantifier (and hence 


no variable). 

h. iis a purely general sentence =p; i is general and contains no in. 

i. i is a singular sentence = ps tisa molecular sentence containing oc- 

currences of only one individual constant. 

We presuppose that an alphabetical order for all primitive signs has 
been established. (Which particular order is chosen is of course arbitrary 
and unimportant. A simple way would be this: we take first the pr (whose 
number is finite even in Q») ordered according to increasing degrees and 
within each degree according to increasing subscripts; then the seven 
signs of Did in the order there given; finally, the in and t in this order: 
ins in ina, ts, ing, is, etc.) On the basis of the alphabetical order of the 
signs a lewicographical order of all expressions is defined (D8). 

D16-8. M; precedes M; in the lexicographical order =p: either (a) the 
first sign in M; which differs from the corresponding sign in A; precedes 
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the latter alphabetically; or (b) XM; is an initial proper part of 9%;. 
The alphabetical order applies only to primitive signs, and hence the 


 Texicographical order only to expressions consisting of primitive signs. 


When we speak of the lexicographical order of expressions which contain 
defined signs, we mean the order of their expansions in primitive signs. 


! § 17. Rules of Truth 


Rules of truth are laid down in the form of a recursive definition of ‘true in 2” 
for sentences (Dr) and classes of sentences (D2). Thereby, a sufficient and 
necessary condition for the truth of any sentence is determined (T1). This con- 
stitutes an interpretation for the systems l. 


The rules of formation determine only the forms of sentences but not 
their interpretation. Now we have to lay down semantical rules of a more 
important kind, those which interpret the systems |, that is, which de- 
termine the meanings of all sentences of these systems. The first, simple 
step toward this aim consists in laying down rules of designation for the 
nonlogical constants, viz., the pr and the in. These rules determine which 
attributes (properties or relations) are designated by the pr and which 
individuals by the in. We shall not actually lay down these rules because 
we want to keep our deductive and inductive logic general, that is, ap- 
plicable to any particular language systems of the structures here de- 
scribed that anybody may choose; we presuppose that these rules have 
been chosen in some way or other as indicated in § 15A. Now our prob- 
lem is how to lay down further rules which serve best for the purpose for 
which we intend to use these systems. For this purpose, the rules must de- 
termine the meanings of the sentences in such a way that we can define 
with their help the following concepts: (i) truth and falsity; (ii) the L- 
concepts, especially L-truth and L-implication, which are the basis of 
‘deductive logic; (iii) the concept of degree of confirmation, which is the 
basis of inductive logic. 

We begin by laying down rules of truth. Their purpose is to state for 
every sentence and every class of sentences in any system £ a sufficient 
and necessary condition for its truth. This is done by first laying down 
direct rules of truth for simple sentences (Dia, b, c), and then indirect 
rules for compound sentences (D1d, e, f, g) and classes of sentences (D2); 
the rules of the latter kind are indirect inasmuch as they refer to the truth 
of components, instances, or elements, respectively. The rules of truth 
together form a recursive definition of ‘true in X. If any sentence 7 is 
given, then this definition tells us under what condition 7 is true; however, 
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in general, the definition alone cannot tell us whether or not the condi- 
tion is fulfilled, in other words, whether or not 7 is true. In order to find 
out whether this is the case, we need, in general, knowledge about the 
relevant facts in addition to the truth-rules. , 

The semantical concept of truth defined by Dr is such that the state- 
ment ‘the sentence @ is true in @ in the metalanguage conveys the same 
factual information as the sentence ¢ itself, which belongs to the object 
language. (For more detailed discussions of this concept see Tarski 
[Wahrheitsbegriff] and “The semantic conception of truth”, Philosophy 
and Phenom. Research, 4 [1944], reprinted in Feigl [Readings]; Carnap 
(Truth].) 


-+D17-1. Let i be a sentence in a system &. ż is true in l =ps# fulfils 
one of the following conditions (a) to (g). 

a. iis an atomic sentence priinj:tnj, . . . Inin, and the attribute desig- 
nated by pry holds for the individuals designated by inj:, inj, .--, 
inn. (For n = 1, this means that the individual designated by inj: 
has the property designated by prs). 

. i has the form in; = inj. 

. tis 

. dis ~j, and j is not true. 

. i is jVk, and at least one of the two components is true. 
i is j « k, and both components are true. 

. i is (i) (Du), and all instances of M; are true. 


nmoag g 


Dr leads immediately to the following theorem, which states, for every 
sentence in £, either directly its truth or nontruth (in (c), (d), (e)), or a 
sufficient and necessary condition for its truth. 


+-T17-1. Theorem of trush-conditions. 

a. An atomic sentence of the form prxin; is true if and only if the in- 
dividual designated by in; has the property designated by prs. 

b. An atomic sentence of the form pryinjitj: ... inj, for n > 1 is 

true if and only if the relation designated by prx holds for the in- 

dividuals designated by injs, ija; - + + Mjn- 

in; = in, is true. 

. inj = ing, with two different in, is not true. 

‘P is true. 

A sentence ~j is true if and only if j is not true. 

. A sentence jVé is true if and only if at least one of the two com- 


ponents is true. 
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h. A sentence j . k is true if and only if both components are true. 
i. A sentence (t+) (D) is true if and only if all instances of M; are true. 


Tif, g, rare clearly in accordance with the customary meanings of the 
connectives, as they are usually stated with the help of truth-tables. Tıc, 
d, e, i are in accordance with our earlier explanations of the meanings of 
the sign of identity, ‘t’, and the universal quantifier in our systems &. 
Thus Tr shows that the interpretation given by the rules of truth Dr is 
the one intended. 

We construe, as is customary, a class of sentences as meaning the same 
as a joint assertion of its sentences. Hence D2. 


D17-2. §; is true in £ = p; every element of &; is a true sentence in &. 


D17-3. 
a. zis false in 2 = ps is a sentence in £ and not true in Q. 
b. §; is false in £ = ps K; is a class of sentences in l and not true in &. 


T17-2. &; is false in £ if and only if &; is a class of sentences in £ and 
at least one sentence of &; is false. (From D3b, D2.) 


§ 18. State-Descriptions (8) and Ranges (R) 


A. A class of sentences which for every atomic sentence i in a system l con- 
tains either i or ~i but not both describes completely a possible state of the 
domain of individuals of £ with respect to all attributes (properties and rela- 
tions) designated by primitive predicates in Į. As state-descriptions (3), we 
take in læ% the classes of this kind, and in w the corresponding conjunctions. 
B. In order to insure that the state-descriptions describe possible states, the 
interpretation of the individual constants and the primitive predicates must 
fulfil the requirement of logical independence. For the purpose of inductive logic, 
every system must furthermore fulfil the requirement of completeness, that is, 
it must be sufficient for expressing all qualitative attributes occurring in the 
given universe. C. The possibility of families with more than two related prop- 
erties (e.g., colors) is discussed. D. A method for interpreting all sentences in a 
system $ is applied, which is different from but analogous to the method in § 17. 
It consists in laying down rules which determine, for every sentence i, in which 
state-descriptions it holds and in which not (D4); in other words, in which pos- 
sible cases ¢ would be true and in which not. Thus the rules determine for every 
sentence i its range (designated by ‘R(i)’ or ‘§,’), that is, the class of those 3 in 
which 7 holds (D6); therefore, we call them rules of ranges. The concept of 
range will be fundamental in our construction both of deductive and of induc- 
tive logic. 


A. State-Descriptions 


In addition to the rules of truth, we need other semantical rules to serve 
as a basis for the L-concepts and the concept of degree of confirmation, in 
other words, for deductive and inductive logic. The method which we 
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shall apply for these purposes is characterized by the use of the two con- 
cepts of state-description and range. We are led to the first by the prob- 
lem of an explication of the concept of possible cases or states-of-affairs. 
Our first step in making this vague concept more precise and specific 
consists in realizing that it must be taken as relative to a language sys- 
tem. Thus we come, with respect to any of our systems £, to the concept 
of a logically possible state of the domain of individuals of £ with respect 
to all attributes (properties and relations) designated by the pr of L. A 
possible state in this sense belongs to that type of entities which may be 
expressed by sentences, hence to the type of propositions. We shall soon 
have to deal with certain classes of possible states; hence this would lead 
us to classes of propositions. Now I personally believe that there is no 
danger in speaking of propositions and classes of propositions provided it 
is done in a cautious way, that is to say, in a way which carefully abstains 
from any reification or hypostatization of propositions, in other words, 
from the attribution to propositions of anything that can correctly be 
attributed only to things. However, there are advantages in avoiding 
propositions altogether and speaking instead about the sentences or classes 
of sentences expressing them, whenever this is possible. In our present 
case this is possible, and we shall do so. We shall see that, with respect 
to any of our systems &—in distinction to more complex language systems, 
for instance, those containing real number variables—every possible 
state can be expressed by a sentence or a class of sentences in the system, 
by a state-description, as we shall call it. There are chiefly two advantages 
in this method. First, we avoid a discussion of the controversial question 
whether the use of the concept of proposition would involve us in a kind 
of Platonic metaphysics and would violate the principles of empiricism. 
Second, there is the technical advantage that for this method a meta- 
language of simpler structure suffices. (To give only a brief indication: 
this method, in distinction to that using propositions, can be applied in 
an extensional (truth-functional) metalanguage; for more detailed ex- 
planations see [Semantics] §§ 18, 19 and [Meaning] §§ 2 and 38.) 

A state-description for a system £ in the sense indicated must state for 
every individual of @ and for every property designated by a primitive 
predicate of @ whether or not this individual has this property; and 
analogously for relations. In other words, if i is any atomic sentence in &, 
a state-description for £ must either affirm or deny 7, hence it must affirm 
exactly one sentence of the basic pair {7, ~i}. Every possible state can 
be described by a class of sentences in l which contains exactly one sen- 
tence from every basic pair in l. As state-descriptions for fs we shall 
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actually take the classes described (Drb). In a finite system Qy, every 
class of the kind described is finite (for example, {‘Pa’, ‘~Pb’, ‘P’} in 
a system with three in and one pr of degree one). Therefore, in 2y, we can 
take as state-descriptions instead of those classes the corresponding con- 
junctions (in the example mentioned, ‘Pa. ~Pb . Pc’); in order to have 
only one state-description for every possible state, we add a requirement 
which uniquely determines the order of the conjunctive components 
(Dra). For the state-descriptions in Sy we use in the metalanguage the 
sign ‘y3’, and for those in Qe ‘o8; however, we shall usually write sim- 
ply ‘3’ if the context of the discussion makes it sufficiently clear which sys- 
tem or systems are concerned. [‘3 is taken from the German ‘Zustand’ .] 


+D18-1. 

a. iis a state-description in Qy (w8 or briefly ‘3’) =przisa conjunc- 
tion which contains as components exactly one sentence from every 
basic pair in Qy and no other sentences, these components being ar- 
ranged in their lexicographical order (D16-8). 

b. &: is a state-description in Qo (‘o 3’ or briefly “3’) =p; Ñ; contains 
exactly one sentence from every basic pair in lo and no other ele- 
ments, 


B. The Requirement of Logical Independence and Completeness 


If the conjunctions and classes in a system £ which we call state- 
descriptions are to fulfil their purpose of describing possible states of the 
universe of £, the interpretation of £ must fulfil the following requirement 

`of logical independence, here formulated in three parts; parts II and III 
follow from I. 

I. The interpretation of 2 must be such that the atomic Sentences are 
logically independent of each other; that is to say, it must never occur 
that a class containing some atomic sentences and the negations of other 
atomic sentences logically entails (contains in its meaning) another atomic 
sentence or its negation. If this requirement is not fulfilled, then some 
state-description will be self-contradictory and hence not describe a pos- 
sible state. (This holds for any state-description containing the class 
specified and, in addition, the negation of the other atomic sentence or this 
sentence itself, respectively.) Suppose, for example, that i and j were 
atomic sentences in 2 such that i entailed j. [The term ‘entailment? is 
here used not as an exact, systematic term but as a common term whose 
meaning may be roughly indicated by saying that the content of 7 is the 
same as that of 7. j, that is, the joint assertion of i andj. A corresponding - 
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exact concept, an explicatum for the concept of entailment as explican- 
dum, will be introduced later (D20-1c) under the term ‘L-implication’. 
An analogous remark holds for the term ‘self-contradictory’.] Then any 
state-description containing both 7 and ~j would be self-contradictory 
because it would assert both j and ~j. In order to fulfil this requirement 
for the atomic sentences, the in and pr must fulfil the following require- 
ments II and III. 

II. The individual constants in 2 must be interpreted in such a manner 
that they designate different and separate individuals. If, for instance, ‘a’ 
and ‘b’ designated the same individual, ‘Pa’ would entail ‘Pd’, and 
‘Pa. ~Pb would be self-contradictory. If the individual a were a spatio- 
temporal part of b and if ‘P’ designated the property of being hot through- 
out, then ‘Ph . ~Pa’ would be self-contradictory. 

III. The primitive predicates in Q must be interpreted in such a manner 
that they designate attributes (properties or relations) which are logically 
independent of each other. For instance, if the properties Raven and Black 
are understood in such a way that the first entails the second (logically, 
not merely by a law of nature) and if they were designated by two pr in &, 
say, ‘P? and ‘P,’, then ‘P,a . ~P,a’ would be self-contradictory. If the 
property Warm and the relation Warmer were designated by two pr, 
say, ‘P’ and ‘R’, then ‘Pa. ~Pb . Rba’ would be self-contradictory. 

The requirement of independence concerns only the interpretation of 
the nonlogical signs (in and pr) of our language-systems £. For the purely 
logical work both in deductive logic (in this chapter) and in inductive 
logic (in the remainder of this book), we need not consider any particular 
interpretation of the nonlogical signs. If, however, it is desired to give a 
specific interpretation in order to see how deductive and inductive logic 
work in application to a particular universe, real or imaginary, then care 
must be taken that the interpretation chosen fulfil the requirement of in- 
dependence. (This requirement belongs, not to deductive or inductive 
logic proper, but to the methodology of logic [see § 44A]; the same holds 
for the requirement of completeness to be discussed soon.) If an interpre- 
tation is given, it may not always be easy to determine whether the re- 
quirement is fulfilled. But it is not difficult to choose an interpretation for 
which we can be practically certain that the requirement is fulfilled. It 
seems perhaps best to imagine as individuals in a system €, not extended 
regions like the physical bodies or events in our actual world, but rather 
positions like the space-time points in our actual world, hence unextended, 
indivisible entities. Since, however, the number of individuals in a sys- 
tem £ is either finite or denumerably infinite, they cannot form a con- 
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tinuum, as the space-time points of the world described in physics do, 
but must instead be imagined as isolated positions in a discrete (i.e., non- 
continuous) universe. Since this is a simplified universe, the qualities and 
relations with which we are acquainted in our actual world cannot, strictly 
speaking, be applied. For instance, a color occurs in the actual world only 
as property of an extended, continuous area. Nevertheless, in order to 
visualize the simplified universe to which a system £ refers, we may 
imagine as attributes designated by the pr in £ something similar to the 
directly observable qualities and relations which we perceive in our world, 
e.g., something like Blue, Hot, Hard, Darker, and the like, but now at- 
tributed to the isolated positions which we take as individuals in £. If 
one wants to study an inductive problem involving a complex property 
W, it is advisable to take ‘W’ not asa pr but rather as a predicate defined 
on the basis of suitable pr which designate simpler concepts. It was ex- 
plained earlier (§ 15B) that it is advisable to express positional attributes 
of individuals, corresponding to spatiotemporal attributes in our actual 
world, not with the help of primitive predicates but by the form of co- 
ordinate expressions used as individual expressions in an extended form 
of the systems (coordinate languages). Consequently, it seems best to 
choose as designata for the pr only attributes of a purely qualitative na- 
ture rather than those which are either positional or mixed, that is, con- 
taining both qualitative and positional components (for example, the 
Property expressed by ‘x is red or identical with b or the relation ex- 
pressed by ‘x is darker and earlier than y’). 


In distinction to the requirement of independence, which seems essential 
both for deductive and for inductive logic, I regard the requirement now 
to be introduced as necessary for inductive logic, although it is not neces- 
sary for deductive logic and therefore has so far not been discussed by 
logicians, it seems. This is the requirement of completeness: the set of 
the pr in a system £ must be sufficient for expressing every qualitative 
attribute of the individuals in the universe of £, that is, every respect 
in which two positions in this universe may be found by observation to 
differ qualitatively. This Tequirement can be divided into the following 
two parts I and II. 

I. It is assumed that any two individuals differ only in a finite number 
of respects. This assumption seems related to Keynes’s principle of limited 
variety: “We seem to need some such assumption as that the amount of 
variety in the universe is limited in such a way that there is no one object 
so complex that its qualities fall into an infinite number of independent 
groups” ([Probab.], p. 258). This assumption, if applied to our actual 
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world, may at first appear as rather doubtful in view of the fact that it 
seems impossible to give an exhaustive description of any physical body. 
However, this is a different question; a physical body is a continuum (a 
nondenumerable set) of space-time positions, while our assumption refers 
only to one individual (or to » individuals in the case of an n-adic rela- 
tion). If we think of the world of things with perceptible qualities, then 
it is not implausible to assume that the number of perceptible qualities, 
say, shades of color, sounds, smells, etc., is finite though rather large. And 
if, on the other hand, we think of the world as conceived in theoretical 
physics, then we find a finite and, indeed, a very small number of funda- 
mental magnitudes to which, according to the assumption’ of physicists, 
the great variety of phenomena is ultimately reducible. Thus in either 
case the assumption I is not as implausible as it may appear at first. 

II. Ifa language system £ is to be constructed for the purpose of apply- 
ing inductive logic to a given universe, it is required that a system of pr be 
taken which is sufficiently comprehensive for expressing all qualitative 
attributes exhibited by the individuals in the given universe. For the 
purposes of inductive logic we may leave aside the epistemological ques- 
tion as to how we are able to know whether this requirement is fulfilled by 
a given system £ for a universe with which we are confronted, just as 
both deductive and inductive logic leave it to epistemology or methodol- 
ogy of empirical knowledge to answer the question as to the procedure by 
which we acquire knowledge of the premises or evidence. The requirement 
may also be formulated conversely: if a system £ is given and a universe, 
real or imaginary, is to be chosen as an illustration or model for £ for the 
purposes of inductive logic, then this universe must be neither richer nor 
poorer in qualitative attributes than £ indicates. For example, let £ con- 
tain only two pr, both of degree one, say ‘P,’ and ‘P,’. Suppose that we 
decide to interpret them as designating the properties Bright and Hot, re- 
spectively. Then we must imagine a universe whose positions differ only 
with respect to Bright and Not-Bright and with respect to Hot and Not- 
Hot. A richer universe in which furthermore the distinction between 
Hard and Not-Hard can be made, is not a fitting interpretation for for 
the purpose of inductive logic, although in deductive logic & could, of 
course, be used for this universe. 

At the present stage of research concerning inductive logic and the con- 
ditions of its applicability, it is not yet quite clear whether the require- 
ment of completeness in its full strength is necessary or whether a modi- 
fied and weaker form of it might be sufficient. At the present moment we 
need not go further into a discussion of the problems connected with this 
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requirement. The requirement is, it seems, not necessary for those parts 
of inductive logic which will be developed in the present volume. We shall 
resume the discussion of these problems in the second volume, in connec- 
tion with our system of quantitative inductive logic. Then the function 
of the requirement within the procedures of inductive logic will become 
clearer. (For a discussion of a special problem connected with the require- 
ment of completeness see [Application] § 5, third paragraph.) 

It might seem plausible to apply the requirement of completeness not 
only to the pr but also to the in in £. This would mean that £, in order to 
be adequate for a given universe, must either (a) contain in for all in- 
dividuals of the given universe (which, of course, is possible only if the 
number of individuals is not more than denumerable) or (b) at least con- 
tain individual variables whose domain of values comprehends all indi- 
viduals of the given universe. However, for our system of inductive logic 
it is not necessary to lay down this requirement because, as we shall see 
(in Vol. II), the degree of confirmation of h on e is not changed by the 
existence of individuals not referred to in + and e. In other words, if 4 and e 
contain no variables, the degree of confirmation of h on e is the same 
for all systems £ in which h and e occur, provided these systems contain 
the same pr and differ only in the number of in. 

We shall later (§ 45A) discuss the problem of how a system of inductive 
logic pertaining to simplified universes can nevertheless be applied, under 
certain conditions, to the actual world. There we shall also lay down a 
further requirement, that of total evidence; it does not concern the in- 
terpretation of the systems, as the two requirements just discussed do, 
but is essential for the application of results of inductive logic to given 
knowledge situations. 


C. Families of Related Properties 


Suppose that two or more properties are related to each other in the 
following way: every individual must necessarily have one and only one 
of these properties; and this is a matter of logical necessity. That is, it 
follows from the meanings alone; it is not merely a contingent law of na- 
ture for which the occurrence of counterinstances remains always possible. 
We speak in this case of a family of related properties. Analogously, we 
speak of related relations. If a system £ contains pr designating related 
attributes, we call them a family of related primitive predicates. For in- 
stance, the properties Cold, Luke-Warm, Medium-Warm, Hot may con- 
stitute a family in a certain universe and may then be designated by four 
related pr; likewise the properties Blue, Green, Yellow, Red in a universe 
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where these are the only possible colors and no individual can be colorless 
(otherwise we should have to add Colorless and thus form a five-member- 
family). If ‘P’ is a pr belonging to a family of n pr, ‘~P’ is logically equiva- 
lent to the disjunction of the other n — x pr. Hence in a two-member- 
family consisting of ‘P, and ‘P,’, ‘~P,’ is logically equivalent to ‘Pe, 
and‘~P,’ to ‘P,’. Thus in this case we do not actually need both pr; we can 
express the two properties either by ‘P,’ and ‘~P,’ or by ‘~P; and ‘P,’, 
and it does not matter which of these two ways we choose. (Similarly, we 
can express » related properties by » — 1 pr and the negation of the 
disjunction of these pr.) For the sake of simplicity, we shall always pre- 
suppose that our systems € contain only two-member-families; one attri- 
bute in each family is then designated by a pr and the other by its nega- 
tion. By this restriction we avoid some complications in certain definitions 
and theorems. If, however, someone wishes to apply our theory to systems 
with larger families of pr, it is easy to make the necessary modifications. 


The chief modifications are as follows. Let a system V containing ¢ families 
of basic attributes be given. Let the pth family ($ = 1, 2,.--, 4) contain np 
attributes of degree dp. For the sake of uniformity let us assume that in & all 
these basic attributes, even those belonging to a two-member-family, are des- 
ignated by pr. Then the basic matrices (D16-3c) and sentences (D16-6b) coin- 
cide with the atomic ones; they do not contain the sign of negation. A class of 
ny atomic sentences formed with the n, pr of the pth family and containing the 
same in (or d,-tuple of in) is called a family of (related) atomic sentences. A 
state-description (D18-z) is defined as a conjunction or class containing exactly 
one atomic sentence from each family. (The requirement of independence ap- 
plies here only to atomic sentences and primitive predicates of different families.) 
The definitions of range (D18-6) and of the L-concepts (§ 20) remain the same. 
It follows that the pr of each family form a division (D25-4). The concept of a 
correlation of basic matrices (§ 28) is replaced by the simpler concept of a corre- 
lation of the pr, defined analogously. A pr of the pth family has the degree dp; 
therefore an argument expression fitting to it in an atomic sentence is an ordered 
les in Q is N2. This is likewise the 


d,-tuple of in. The number of such d,-tup’ ; 
f the pth family. Therefore the 


number of atomic sentences with a given pr o: mily, 1 
number of all atomic sentences with pr of the pth family in Qy is n,N??; and 
the number of all atomic sentences in Q is Zin N°]. A state-description con- 
tains for every d,-tuple of in exactly one of the np atomic sentences with 
the np pr of the pth family. Thus there are nj? possibilities for those sub- 
conjunctions of state-descriptions which contain only pr of the eth family. 
Hence the number of state-descriptions in 8% (T29-1a) ist = Hing”). 

Now let us consider the case in which all pr in & are of degree one, hence 
designate properties, not relations (§ 31). A Q-predicate-expression (D31-1a) 
is here a conjunction of pr containing exactly one pr from each family. There- 
fore the number of Q’s (T31-1) is x = ns. The definitions of width and rela- 


ed. But the relative width of a pr of the pth 


tive width (D32-1) remain unchang' t a 
family is not necessarily 1/2 (T33-1b) but 1/7. The relative width of a con- 
junction of several nonrelated pr is the product of their separate relative widths. 
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If expressed in terms of x, certain important values are the same as before. This 
holds, in particular, for the number of 3 (¢ = K”, T35-1b), the number of Gtr 
(rT, T35-1d), and the number of those 3 which have a given set of Q-numbers 
(ts, T35-4). The latter two numbers will later be taken as the basis for the de- 
termination of the measure-function m* and the degree of confirmation c* 


(§ x10, (2)). 
D. The Range of a Sentence 


We write ‘V3’ for the class of all state-descriptions in a given system 
(the universal range) and ‘Ag’ for the null class of state-descriptions (the 
null range). When necessary, a left-hand subscript ‘N’ or ‘œ?’ is added. 


D18-2. 

a. V3 (or simply Vg) = ns the class of all 3 in gy. 
b. œ Vg (or Vg) = ns the class of all 3 in lo. 

c. wAg (or Ag) = ns the null class of 3 in fy. 

d. oAg (or Ag) = nps the null class of 3 in lo. 


If we understand the meaning of a sentence, then we know in which of 
the possible cases it would hold and in which not. And if we wish to give an 
interpretation to a sentence, in other words, to state its meaning, then one 
possible method for doing so consists just in saying in which of the possible 
cases it holds and in which not. Speaking in terms of state-descriptions 
instead of the possible cases which they describe, we can give an interpre- 
tation to a sentence by saying in which state-descriptions it holds and in 
which not: We shall do this now by rules which state the conditions for ‘i 
holds in 3,’ for all sentences in a system £. This concept has a certain 
analogy to that of truth, it is, so to speak, conditional truth because it 
means: ‘i would be true zf the possible case described by 3; were the real 
case’; in other words, ‘¢ would be true if the individuals had just those 
properties and relations which are attributed to them by 3,’. Therefore, 
D4 is quite analogous to the earlier definition of truth (D17-1); and we 
can easily see that D4 is in accordance with the interpretation intended 
just as the earlier definition was. 

D3 serves merely for the introduction of a convenient abbreviating 
phrase. 


D18-3. Let 7 be a sentence in a system £ and 3, a state-description 
in £. z belongs to 3a =ns 7 is a basic sentence in £ and occurs in 3, either 
as a conjunctive component (if £ is finite) or as an element (if £ is infinite). 


‘4+-D18-4. Let 7 be a sentence in a system Ì, and 3, a state-description 
in Q. holds in 3, = ns one of the following conditions (a) to (g) is fulfilled. 
a. iis an atomic sentence and belongs to 3, (D3). 
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. i has the form in; = inj. 

iis. 

. iis ~j, andj does not hold in 8+. 

. i is j V k, and at least one of the two components holds in 8a- 
. iis j. k, and both components hold in 8+. 

. i is (i,)(M,), and all instances of M; hold in 33. 

D18-5. Let Ñ; be a class of sentences in l. &; holds in 8, = ns every sen- 
tence of &; holds in 3x. 

According to our previous explanation, the meaning of a sentence iis 
determined by the class of those 3 in which i holds. We call this class the 
L-range or, briefly, the range of i (D6); for this, we write in signs ‘R(i)’ 
or simply ‘9’; and for the range of R; ‘R(K:)’ (here no shorter form). 


mammoads g 


+D18-6. Let i be a sentence in a system £ and §; a class of sentences 
in l. 
a. The range of i in L (‘R(i)’, ‘N) =n: the class of those 3 in Qin 
which 7 holds. 
b. The range of Ñ; in £ (‘R(S,)’) = ns the class of those 8 in £ in which 
Ri; holds. 
On the basis of this definition D6 and the rules constituting Dgand Ds, 
we easily see that the ranges for all forms of sentences and for &; are de- 
termined by the following theorem. 


+T18-1. Theorem of ranges. Let i be a sentence and §; a class of 
sentences (in a system £). 
a. If i is an atomic sentence, K; is the class of those 3 to which i be- 
longs. 
. If i is inj = inj, Reis Vg. 
. Ifi is in; = in, with two different in, R; is Ag. 
. If iis ?, R; is V3. 
If i is ~j, R: is V3 ~R. 
. If ¿isj VR, R; is R; VY Rr. 
. Ifiisj. k, Riis Ri O Re. 
_ If i is (i) (D), R: is the class-product of the ranges of the in- 
stances of My. 
. If R is non-empty, R(K:) is the class-product of the ranges of the 
sentences of &;. 
j. If Q; is empty, R(&:) is V3. 
Tx shows that the rules of D4 and Ds determine indirectly the ranges 
of all sentences and classes of sentences in £. Therefore we may call these 


punmono o 
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rules rules of ranges (instead of ‘rules of holding in a state-description’). 
We could, of course, formulate the rules just as well directly for ranges, in 
a way similar to Tr; we have chosen the other way merely to facilitate 
the understanding. 

The rules of ranges are the fundamental semantical rules for our sys- 
tems £. The concept of range, which is determined by these rules, will, in 
our construction, be the cornerstone both for deductive and for inductive 
logic, since we shall define with its help both the L-concepts and the con- 
cept of degree of confirmation. 

We have twice given an interpretation for the sentences in £, first by the 
rules of truth (D17-1), then again by the rules of ranges (D18-4). We have done 
so only in order to make the definition of truth more easily understandable. In 
fact, truth could be defined, instead of by D17-1, on the basis of Dr8-4 in the 
following way. “True atomic sentence’ would be defined in a way similar to 
Dry-1a. Then the true state-description 8r would be defined as that state- 


description to which all true atomic sentences and the negations of the other 
atomic sentences belong. Finally, we define: i is true if it holds in 8r. 


§ 19. Theorems on State-Descriptions and Ranges 


Some theorems concerning 3 and § are stated, as lemmas for later theorems 
in deductive and inductive logic. 


We state here some theorems on 3 and R. They are not of much in- 
terest in themselves but serve as lemmas for later theorems on L-concepts 
and on confirmation. These theorems hold, if not indicated otherwise, 
with respect to any finite or infinite system £ for the sentences and the 
8 of that system. 


T19-1. For every atomic sentence i and every 3;, either i or ~i be- 
longs to 3;, but not both. (From D18-1, D18-3.) 

+T19-2. For every sentence # (of any form, not only atomic) and every 
3), either i or ~ż holds in 3;, but not both. (From Dr8-4d.) 

T19-3. If i is a basic sentence, R; is the class of those 3 to which i be- 
longs. 


Proof. 1. For an atomic i, from T18-1a. 2. Let i be ~j. Then j is atomic, and 
Ry is the class of those 3 to which 7 belongs (1). R: is the class of the remaining 
3 (T18-1e), hence of those 3 to which j does not belong. This is the class of those 
3 to which ~j belongs, which is 7. 


As mentioned earlier (§ 16), we speak generally of conjunctions with 
n components, for any finite n > o; if i has not the form 7. k, we regard 
i as a conjunction with one component, which is z itself. 
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D19-1. j is a subconjunction of i =p; every conjunctive component of 7 
is also a conjunctive component of 7. > 
T19-4. 
a. For £y. If 3; is a subconjunction of 3;, then it is the same as 8;. 
b. For a. If 3; is a subclass of 3;, then it is the same as 33. 
T19-5. 
a. For lo. Let &; be a class of basic sentences not containing any basic 
pair. Then 9(&,) is the class of those 3 of which Q; is a subclass. 
Proof. We suppose that R: is non-empty; otherwise, the theorem follows 
from T18-1j. Then R(&K;) is the class-product of the ranges of the sentences of 
Ri (T18-1i). 1. Let j be any sentence of &; and 3; any 3 of which &; is a sub- 
class. Hence je;. Since j is a basic sentence, R; is the class of those 3 to which 
j belongs (T3). Hence 3;eRj. Since this holds for every sentence j of Ri, Bi be- 
longs to the class-product of the ranges of the sentences of &;, hence to R(R:). 
2, Let 3x be any 3 of which &; is not a subclass. Then there is a basic sentence i 
belonging to &; but not to 3x. Let k be the other sentence of the basic pair of i. 
Then & belongs to 3x (D18-1b) but not to &;. i belongs to every 3 in R: (T3); 
hence 3x is not an element of Rs. Therefore, since ieks, 3x cannot belong to the 
class-product of the ranges of the sentences of Si, that is, to R(&:). 


b. For Qy. Let i be a conjunction of n basic sentences (m = 1) among 
which there is no atomic sentence together with its negation. Then 
N; is the class of those 3 of which 7 is a subconjunction (Dx). 

Proof analogous to (a), with Tr8-1g instead of T18-1i. 


T19-6. For every 3; 2(8,) is {3s} (that is to say, the range of 8; con- 
tains only 3; itself). 
Proof. 1. For Qy, from Tsb, T4a. 2. For fo, from Tsa, T4b. 


The following theorem T8 deals with two consecutive systems in the 
sequence of finite systems, viz., with tw and &y,:. There is just one indi- 
vidual constant ‘ay,’ which is new in w+; thatis, not already occurring 
in Qy. Hence the atomic sentences which are new in w+: are just those 
which contain ‘ay,1’. Let their number be m, and let n be 2”. Then there 
are n selections of new basic sentences (T40-31g), that is, classes con- 
taining exactly one sentence from each new basic pair and no other ele- 
ments. Let us use here (for T8 only) ‘3’ and ‘i’ for the state-descriptions 
and ranges of Qy, but ‘3” and ‘9t” for those of v1. For every Bj, there 
is exactly one 3; which is a subconjunction (Dz) of it, namely, that 
which contains all its old basic sentences. And the new basic sentences 
in 3% form one of the » selections mentioned above. In this way, » 3’ 
grow out of 3; by the adjunction of one of the » selections each; hence 3; 
is a subconjunction of all of them. This situation must be kept in mind for 
the following theorem and its proof. 
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719-8. Let i be any nongeneral sentence in fy, and hence in lyp. 
Then R; is the class of those 3’ which contain the sentences in R; as sub- 


conjunctions. 


Proof. Since i is nongeneral (D16-6g), it is constructed out of simple sen- 
tences of the forms mentioned in T18-1a, b, c, d by a finite number » (20) 
of applications of the connectives mentioned in Tr8-1e, f, g. Let RY be the class 
of those 3’ which contain the 3 in R; as subconjunctions. Then the theorem 
says that N; is the same as RY. We shall show first that this holds for those 
four simple forms (a), (b), (c), (d); and, then, that it holds for any sentence of 
one of the compound forms (e), (f), (g), provided it holds for the components 
occurring. Then the general theorem follows by mathematical induction with 
respect to n. — a. Let i be atomic. Then Ks is the class of those 3 to which i be- 
longs (Tx8-1a). Likewise, R; is the class of those 3’ to which i belongs, hence 
the same as Ri. — b, Let i be in; = inj. Then K is the class of all 3 (T 18-1b). 
Therefore, R? is the class of all 3’, hence the same as Ri. — c. Let i be ins, = 
in, with two different in, Then R; is the null class (Tx8-r1c). Therefore, Rf is 
the null class, hence the same as Ri. — d. Let i be ‘’. This case is like (b) 
(T18-1d). — e. Let i be ~j. Suppose the theorem holds for j, i.e., Rý is the same 
as R}. Then we shall show that it holds likewise for i i, Ri is the class of those 8 
which do not belong to R; (T18-re). Likewise, Ri is the class of those 3’ which 
do not belong to Nj, which is the same as RY. Hence NR; is the class of those 3’ 
which contain none of the sentences of R; as subconjunctions, and hence con- 
tain the sentences of R; as subconjunctions; this is the, class R. — f. Let i be 
IVE. Suppose the theorem holds for j and for k; i.e., Rj is the same as RF, and 
R; is the same as Rf. We shall show that it holds likewise for i. Ri is RvR 
(T18-1f). Likewise, Ri is Ri HOR, and hence RIRE. This is the class of those 
8' which contain as sibeonfunctions the sentences in R; and in R, hence the 
sentences in R; thus it is the class R}. — g. Let i bej «a k. The proof is analogous 
to that of (f), using T18-r1g. 


The following theorem To is analogous to T8 but deals with fy and Lo. 


T19-9. Let i be any nongeneral sentence in y, and hence in lo. Then 
R; in Lo is the class of those 3; in Qo for which there is a 3; in R; in Qy 
such that the conjunctive components of 3; belong to 3). 


Proof analogous to that of T8. 


§ 20. L-Concepts 


A sentence (or proposition) is usually regarded as logically (necessarily, 
analytically) true if it holds in any possible case. Therefore we define the expli- 
catunt for this vague, traditional concept in this way: 4 is L-true (in signs, ‘| i’) 
if i holds in every 8, hence, if R; is the universal range (D1a). We define analo- 
gously: i is L-false if ¢ holds in no 3, hence, if R; is the null range (Drb); this 
is the case if } ~i (Tra). As an explicatum for what is known as logical or 
necessary implication or entailment, we define: į L-implies 7 if 7 holds in every 
8 in which i holds, hence, if R: is a subclass of R; (Dre); this is the case if }i D j 
(Trb). As explicatum for logical or necessary equivalence, we define: i and j are 
L-equivalent if they have the same range (D1d); this is the case if} i = 7 (Tre). 
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Further, ‘L-disjunct’ and ‘L-exclusive’ are defined (Die, f). If a sentence is 
either L-true or L-false, it is called L-determinate (D3); otherwise it is called 
factual (D4), because its truth-value is dependent upon facts. If a sentence is 
true but not L-true, it is called F-true (Dsa), because it is true by virtue of 
facts. The terms ‘F-false’, ‘F-implies’, ‘F-equivalent’ are defined analogously 
(Dsb, c, d). Some theorems concerning the concepts defined are given. 

We shall now introduce the L-concepts. They constitute the basis of 
deductive logic. Our method of defining these concepts bases them on the 
concept of range and hence on that of state-description. We can give here 
only short explanations. [For a more detailed discussion of the L-concepts 
see [Semantics], §§ 14 ff., 20, and [Meaning], § 2; the method is developed 
from an idea of Wittgenstein ([Tractatus], 4.463), see [Semantics], p. 107] 

The concept of L-truth is meant as an explicatum for that concept, 
frequently used but seldom exactly defined, which is variously charac- 
terized as analytical truth (Kant), necessary truth, logical truth, truth 
based on logical grounds, as distinguished from contingent, factual truth. 
It seems that we are sufficiently in agreement with at least some concep- 
tions of this explicandum if we try to make it more explicit in this way: 
a sentence (or proposition) has this kind of truth if it would be true under 
any conceivable circumstances, in other words, in any possible case. Since 
we have constructed the state-descriptions 3 in such a way that they 
represent the possible cases, it seems natural to define the explicatum 
‘L-true’ in this way: ĉ is L-true if it holds in every 3, in other words, if 
NR: is the universal range (Dra). We write ‘Hi’ as short for ‘i is L-true’; 
the scope of ‘H is always the whole immediately following meta-expression 
for a sentence (for example, ‘iV?’ is meant as ‘(i Vj)’, hence as ‘the 
sentence é V j is L-true’). 

The concept of Z-falsity is introduced as an explicatum for logical im- 
possibility, self-contradiction, falsity based on logical grounds, as dis- 
tinguished from contingent, factual falsity. A sentence has this kind of 
falsity if it holds in no possible case. Therefore, we define the explicatum 
in this way: i is L-false if ¢ does not hold in any 8, in other words, if R; is 
the null range (D1b). 

The concept of L-implication is meant as an explicatum for necessary 
implication, logical implication, entailment, the converse of logical conse- 
quence or logical deducibility. Tt seems that this explicandum is meant as 
that relation which connects 7 and 7 (or the corresponding propositions) 
if it is impossible that ż is true but j is not, in other words, if j holds in 
every possible case in which holds, Therefore, we define the explicatum 
in this way: ¿ L-implies j if j holds in every 3 in which ż holds, in other 
words, if R; is a subclass of R; (Dic). 

The concept of L-equivalence is intended as an explicatum for necessary 
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equivalence, mutual entailment, mutual logical deducibility. Therefore 
we define it by the identity of ranges (Did). Then it is the same as mutual 
L-implication (T2i). i 

The concept of L-disjunctness is meant as an explicatum for that rela- 
tion which holds between two or more sentences (or propositions) j, j+, 
.- -Jn (n = 2), if by logical necessity at least one of them is true, in 
other words, if in any possible case at least one of them holds. Therefore, 
we take as definiens for the explicatum this condition: in every 3 at least 
- one of the sentences holds, in other words, the class-sum of the ranges of 
the sentences is the universal range (Dre). This means the same as that 
the disjunction of the sentences is L-true (Trd). 

The concept of Z-exclusion is meant as an explicatum for logical in- 
compatibility, logical impossibility of joint truth. This explicandum is 
that relation which connects the sentences i and j (or the corresponding 
propositions) if there is no possible case in which both of them hold. There- 
fore, we take as definiens for the explicatum this condition: there is no 3 
in which both sentences hold, in other words, the class-product of their 
ranges is null (Dif), This means the same as that the conjunction of the 
sentences is L-false (Tre). i 


+D20-1. Let ¢ andj be sentences in a system £. The terms here defined 
will be applied to classes of sentences (on the basis of Dx8-6b) in the 
same way as to sentences. 
tis L-true (+?) (in 2) =ps Ri is Vg. 
t is L-false (in 2) =p; R; is Ag. 
i L-implies j (j is an L-implicate of i) (in 2) =p; R; C Ry. 
7 is L-equivalent to j (in 2) =p; N; is the same as Ry. 
jx Jas- ++, Jn (n = 2) are L-disjunct with one another (in 2) = ps 
RG) Y RG) UY... ORG) is Vg. 

f. iis L-exclusive of j (in 2) =p: Ri © Ry is A3. 

g. The class of sentences §; is (or, the sentences of &; are) L-exclusive 
in pairs (in 2) =p; every sentence of Ñ; is L-exclusive of every other 
sentence of &;. 


A eed oe 


The following theorem Tx is based on Dr. It states sufficient and nec- 
essary conditions for the L-concepts just defined, except L-truth, in terms 
of the L-truth of certain sentences. Often these conditions are more con- 
venient than those in Dx in terms of ranges. Therefore we shall often 
make use of this theorem (usually without explicit reference). We shall 
often write ‘}7 D 7’ as a convenient abbreviation for ‘i L-implies 7’; like- 
wise ‘}7 = 7’ for ‘tis L-equivalent to f’. In this and many of the subsequent 
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theorems, we omit for brevity the reference ‘in the system ¥'; it is under- 


stood that any L-term or F-term used is meant with respect to any finite 


or infinite system , unless otherwise indicated. 


-+T20-1. Theorem of the L-concepts, with respect to sentences. ik 

a. i is L-false if and only if | ~#. 

b. i L-implies j if and only if }¢ D j. 

c. i is L-equivalent to j if and only if +7 = j. 

d. fe jay + + yn (0 = 2) are L-disjunct with one another if and only if 

TATATATA 

e. i is L-exclusive of j if and only if | ~(¢ . j), hence if and only if 

ti D ~j, hence if and only if }j D ~i. 

Tx states characteristic sentences for the L-concepts (compare [Seman- 
tics] § 22), that is to say, sentences whose L-truth is a sufficient and nec- 
essary condition for the corresponding L-concepts. Now we shall often 
find situations (especially in the theory of the degree of confirmation on 
a given evidence e) where one of these characteristic sentences is not L- 
true but is L-implied by a given sentence e. In these cases we shall often 
use the L-term in question in a relative way with respect to e. The fol- 
lowing definition introduces the relative L-terms corresponding to Tıc, d, e. 


D20-2. Let e, i, and j be sentences in £. S 
a. i and j are L-equivalent (to one another) with respect to e (in 2) =p1 
fed G@ =). 
b. ja jay + + y Jn (n = 2) are L-disjunct (with one another) with re- 
spect to e (in 2) =ps łe DGV jaN -a Vin 
. i and j are L-exclusive (of one another) with respect to e (in 2) =p! 
teD ~(i.j) (hence fe > mi\ ~j, feet dD ~j, esi.j is L- 
false). 
d. The class of sentences Ñ; is (or, the sentences of ÑQ; are) L-exclusive 
in pairs with respect to e (in £) =p: every sentence in Ñ; is L-ex- 
clusive with respect to ¢ of every other sentence in &;. 


o 


In T2, some elementary theorems concerning L-concepts are listed, 
which hold in any finite or infinite system Ì. (For proofs of these and re- 
lated theorems see [Semantics] §§ 20 and 14.) 

T20-2. 

a. ®, is L-true if and only if every sentence of &; is L-true. 

b. If }i Dj and }j D k, then ti D k. Analogously for classes of sen- 

tences. 7 
c. If pi D jand ti, then Hj. 
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d. If je8,, then &; L-implies 7. 

e. If R; C Q then &; L-implies 8;. 

f. i (or &,) L-implies &; if and only if ¢ (or R;, respectively) L-implies 
every sentence of §). 

g. Ifj, then for every i, Hi D j. 

h. If} ~i, then for every j, |i D j. 

i. |¢ = j if and only if ți D jand ți D i. 

j. The null class of sentences is L-true. 

k. If 7 is a conjunction with n components (n = 1) and K; is the class 
of these n components, then ¿and Ñ; are L-equivalent. (From T18-1g 
and i.) 

l. }¢ D 7 if and only if fi.j = i. 

`m. Ifti Vj (hence, tand j are L-disjunct) and } ~(i +j) (hence, i and j 

are L-exclusive), then }i = ~j and} ~i = j. 
o. {i D j, i} L-implies j. 
p. A conjunction with » components (» = 1) is L-true if and only if 
every component of it is L-true. 

. A disjunction with » components (n = 1) is L-false if and only if 

every component of it is L-false. 

s. If a subclass of R; is L-false, then Ñ; is L-false. 

. Let 8; be a 3 in £. 7 holds in 3: (or, in other words, 3.%,) if and 

only if 3; L-implies 7 in Q. 
Proof. j holds in 3; if and only if 3:e®,; hence if and only if R(3:)C R; 
(Tr9-6), hence if and only if 3; L-implies j in l (Dre). 


2 


” 


If ¢ is either L-true or L-false, we can determine its truth-value by 
logical, that is, semantical, analysis on the basis of the rules of ranges. 
Therefore, we call 7 in this case L-determinate (D3). Otherwise, i.e., if 
t is neither L-true nor L-false, we call it factual (D4), because in this case 
we need knowledge of the relevant facts, in addition to the interpretation 
of the sentence, in order to find its truth-value. If is true but not L-true, 
in other words, if 7 is true and factual, we call it F-true (for ‘factually 
true’) (Dsa). Other F-terms are defined analogously (Dsb, c, d); they 
will be used only rarely. All these terms are applied to classes of sentences 
in the same way as here defined for sentences. 


+D20-3. i is L-determinate (in £) =r 7 is either L-true or L-false. 
+D20-4, i is factual in £ = psi is a sentence in £ and not L-determinate. 
D20-5. 

a. i is F-true (in 2) =p; 7 is true but not L-true. 
b. ż is F-false (in 2) =p 7 is false but not L-false. 


> 
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c. i F-implies j (in 2) =p; the sentence i D j is F-true. 
d. i is F-equivalent to j (in 2) =p; the sentence i = 7 is F-true. 


Some theorems on factuality follow. 


720-4. i (or &;) is factual if and only if R; (or N(R), respectively) i is 
neither Vg nor Ag. 

120-5. If ¢ or Q; fulfils one of the following conditions (a) to (g), it.is 
factual. 

a. iis a basic sentence. (From T19-3, T4.) 

b. zis a 3 in fy. (From Tr9-6, T4.) 

c. §; is a 3 in lo. (Like (b).) 

d. §t; is a non-empty subclass of any 3; in fo. 

Proof. &; is not L-true (T2a, (a)) and not L-false (Ts, (c)). 


e. i is a subconjunction with n components ( = 1) of any 3; in fy.” 
(Analogous to (d), from (a), (b), T2k). 

f. §, is a non-empty class of basic sentences which does not contain a 
basic pair. (From (d), (e), T2k.) 

g. i is a conjunction of n basic sentences (m = 1) not including any 
atomic sentence together with its negation. (From (d), (e), T2k.) 

T20-6. 7 is factual if and only if ~7 is factual. (From Tra.) 


The following theorems T8-T11 speak about L-concepts with respect 
to sentences in different systems. 
T20-8. Let 7 and j be any nongeneral sentences (D16-6g) in fy, and 
hence also in {vm for any m. 
a. i is L-true in y4m if and only if it is L-true in fy: (From T19-8, by 
mathematical induction with respect to m.) 
b. i is L-false in &y;m if and only if it is L-false in ly. (From (a).) 
c. Each of the relations of L-implication, L-equivalence, L-disjunct- 
ness, L-exclusion holds for the pair 7,7 in Qw4m if and only if it does 
in £y. (From (a).) 
d. zis factual in @ym if and only if it is factual in fy. (From (a), (b).) 
T20-9. Let i and j be any mongeneral sentences in ly, and hence also 
in Qo. 
a. 7 is L-true in lo if and only if it is L-true in £y. (From T19-9.) 
b. i is L-false in Qo if and only if it is L-false in ly. (From (a).) 
c. Each of the relations of L-implication, L-equivalence, L-disjunct- 
ness, L-exclusion holds for the pair i,j in Qœ if and only if it does in 
£y. (From (a).) 
d. i is factual in lə» if and only if it is factual in fy. (From (a), (b).) 


88 Ill. DEDUCTIVE LOGIC 


T20-10. Let i and j be any nongeneral sentences in any finite or infinite 
system £. 

a. If i is L-true in any system, then it is L-true in every system in 
which it occurs. (From T8a, Toa.) 

b. If i is L-false (or L-determinate or factual, respectively) in any sys- 
tem, then it is likewise in every system in which it occurs. (From (a).) 

c. If L-implication (or L-equivalence, L-disjunctness, L-exclusion) 
holds for the pair 7,j in any system, then likewise in every system in 
which 7 and j occur. (From (a).) 


Tro does not hold for all sentences. Counterexamples can easily be 
found among general sentences containing ‘=’. For instance, let i be 
WOE) = yV z= s Vy = 5)’, and j “(Ax)(Ay)(x = 3)’; i says that 
there are at most two individuals, and 7 says that there are at least two 
individuals. Since both sentences do not contain any in, they occur in 
all systems. However, 7 is L-true in l, and &, only, but L-false in all other 
systems. j is L-false in £, only, but L-true in all other systems. Hence 
their conjunction i . 7 is L-true in 2, only, but L-false in all other systems. 
There are sentences which, if we run through the sequence of systems |., 
2a, etc., are first L-true in some systems, then not L-true in some follow- 
ing systems, then again L-true in some systems, and so on in a never end- 
ing oscillation. (Examples will be given in Vol. II, in connection with m* 
for lo ; likewise counterexamples for the converse of Tır.) A sentence of 
this kind cannot be L-true in fq ; this is stated by Tır. (On ‘final segment’ 
see D4o-5.) 

T20-11. i 


a. If iis L-true in le, then iis L-true in a final segment of the sequence 
of the systems fy (in other words, there is an m such that 7 is L-true 
in every system ly with V = m). 

b. Analogues to (a) hold for the concepts mentioned in Trob and c with 
respect to 7 or the pair 7,7, respectively. 

The converse of Trx does not hold generally. 


It may be remarked incidentally that Tır holds for the systems £ only be- 
cause these do not contain attribute variables. Let 7 be the sentence 
Wes ton ale D (Ay) @) (Rez = (2 = y))] « 0) (Ex)[Px. (2) (Rey = 
2 = x))}. 3 
This sentence says that not all individuals are P, and R is a one-to-one corre- 
spondence between those individuals which are P and all individuals; in other 
words, R maps the whole universe of individuals on a proper part of it. This is 
possible, of course, only for the infinite universe. Therefore] is factual in 2 but 
L-false in any finite system Qy. In systems {’ with attribute variables in quanti- 
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fiers, a sentence j’ can be constructed which follows from j by existential gen- 
eralization (‘there is a property F and a relation H such that . . .’). This sen- 
tence j’ is, like j, L-false in every ly; but it is, in distinction to j, L-true in a5 
Hence, Tıra does not hold for the systems &’. Since j’ is L-true only in 2% and 
otherwise L-false, it may be taken as a formulation of the infinity condition 
for the domain of individuals. For a discussion of infinity conditions see Hilbert 
and Bernays [Grundlagen], I, 213 and 209 ff. Each infinity condition corre- 
sponds to the satisfiability (“Erfiillbarkeit”, cf. [Formalization], p. xi) of a 
matrix with free attribute variables, and hence it can be formulated by a sen- 
tence only with the help of quantifiers with attribute variables. Therefore, 
sentences of this kind cannot occur in our systems &. 


This ends the first part of this chapter, which alone is a prerequisite 
for the next chapter (see § 14). 


§ 21. Theorems of Propositional Logic 


A. Some elementary theorems of propositional logic are listed for later ref- 
erence. Most of them are well known. B. On the basis of the theorems under 
(A) some theorems on éruth-tables (T7) and state-descriptions (T8) are proved. 
C. The disjunctive and the conjunctive normal forms of sentences are defined, 
and theorems for them are given. 


In §§ 21-24 theorems on L-concepts are listed for later use. They are ` 
based on the definitions and theorems of §§ 18-20. Most of them are well- 
known theorems of propositional and lower functional logic. In most cases 
it seems unnecessary to give proofs or back references. 


A. Elementary Theorems of Propositional Logic 


The theorems of this section are theorems of propositional logic; this 
means that they apply L-concepts to sentences on the basis of the way in 
which these sentences are constructed with the help of connectives. Any 
sentence asserted to be L-true in these theorems from T3 on can easily 
be shown to be tautologous on the basis of the ordinary truth-tables, and 
is therefore L-true in l (T1). 

D21-1. Let i be a sentence in a system &. į is tautologous = ps t is con- 
structed out of components js, Ja, +++» În (n = 1) with the help of con- 
nectives such that the following two conditions (a) and (b) are fulfilled. 

a. Every j-component is not a negation, disjunction, or conjunction of 

other sentences but may have any other form whatever, including 
the universal form. 

b. For every possible distribution of the truth-values T and F among 

the components, where ‘? always has the value T, the truth-value 
of i determined on the basis of the ordinary truth-tables for the con- 


nectives is always T. 
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T21-1. Every tautologous sentence in £ is L-true in 2. 


Proof. The rules for the connectives in £ are in accordance with the ordinary 
truth-tables, see D18-4d, e, f, and, based upon it, T18-re, f, g. (This holds like- 
wise for our definitions A1s5-1a, b for ‘>’ and ‘=’; therefore, if a sentence con- 
taining these signs is given, we need not eliminate them before applying Tx 
but may instead use the ordinary truth-tables for these signs.) 


T21-3. L-true sentences in &. 


a. 
b. 
c. 
d. 


Ht.. 

HiV ~i. 

HDi ie | ~iVi. 
F~ (i. ~i). 


T21-4. L-true sentences with ‘D’ in Q. Each of the following items (a) 
to (s) states three theorems: 
A. Any sentence of the form described is L-true (as indicated by ‘F) (Tx). 


Cc. 


Brwocprm me ao op 


n. 
r. 
S. 


- The antecedent of the main connective ‘D’ L-implies the conse- 


quent (T20-1b) (for example, in (a): i L-implies i V j). 

Let the matrix M, be constructed from a sentence here described by 
replacing the components i, j, etc., by any matrices. Then } ( )(M,) 
(T22-4). (For example, from (a): | ()(P.x D Pæ V P,x).) 


tiD iVi. 
Hij Di 
EDH. GDR)D (EDA). 
t=). G=k) D (i= hk). 


tGIADIGEVED;VR). 


- GDA) D GRD j.k). 
EDF) Dz. 
FEIDE A ie, 


HG =9).i Dj. 
GSA ni day. 


-f@=7) 3 (Dj). 


t=) > (GD). 
HEV). ~iD j. 
EDJ). (kD GVRDJVD. 
bide. 

tD i. 


T21-5. Sentences with ‘=’, Each of the following items (a) to (u) (2) 
states five theorems: 


A. 
B. 


Any sentence of the form described is L-true (as indicated by ‘}’). 
The two components of the main connective ‘=’ are L-equivalent 
(T20-1c) (for example, in (e)(1): i Vj is L-equivalent to jVi). 
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C. The two components are L-interchangeable (T23-rb). 

D. Let Mı = Mı be constructed from any sentence here described by 
putting any matrices in the place of the components i, j, etc. Then 
HCM: = Mı) (T22-4). 

- Mı and M are L-interchangeable. (T23-2c). 

be =a. 

fi = wn. 

bi =iVi. 

be Seed. 

. Principles of commutation. 

(1) Hi Vj=jVi. 
(2) tisj= j.i. 
G) HE=J=G=). 

f. Principles of duality. 

(1) t~ (Vj) = ~i. wi. 


geo oP B 


(2) Fo Gir Vis Vi... Ving) = Nine Wize oe ~in. 
(3) E~ Gj) = ~i N j. 
(4) t~ Grodan. eee in) = ~i V ~i V |.. V ~in 


g. Principles of negation. 
(1) t~ (li~) = G99). 
(2) ~x (Dj) =i. >. 
Gt~G=)=G=y). 
(4) t~ @ =j) = (G. ~j) V (~ii). 
h. Principles of transposition. 
(1) FED j) = (~j D ~i). 
(2) H~ D j) = (~j D i). 
(3) FED ~j) =G ~i. 
(4) tG=j) = (~i = ~). 
(5) HG = ~j) = (~i.= j). 
(6) aj D k) = (i. ~k D ~j). 
(7) FGD jVk) = (i. ~j D k). 
(8) ED ~j Vk) = (i.j D k). 
i (1) }@ 39) = @ =i.j). 
(2) }@ 3 7) = GD ij). 
(3) tE) =G=iV)). 
WED) = GVj24). 
i (1) bi = GVJ). GV ~j). 
(2) Hi = (isj) V G. ~j). 
k. (1) GD (GD k)) = (i.j D k). 
(2) ED GDk)=0 G k). 
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. Principles of association. 


(1) GVA) Ve =V (Vk). 
(2) Hij). k =i. {j . k). 


. Principles of distribution. 


(1) tin GVA) = G9) V (i. k). 

(2) bin (aN jN oo Vin) = Gof.) V Gj.) V...V Gaja). 

(3) GVW. Vida ONAN VIS = Geog) V Grd) 
Vaa N (irojn) V (Cin eft) Voo  V (imo js) V Cm ja) V 
V (in = jn), where at the right-hand side conjunctions for all 
pairs of an i-sentence and a j-sentence occur. 


(4) iV (G.k) = GV3). GVA). 


(5) HV Grades +s ee = G Mad ENJ) os ve GV jn). 
(6) Rs Barat ccc in) V (jr oj eja) = (i: Vis) iV ja). 
» (i: Vin) = (a Vi) eese me Vja) «(im Viz) oe 


s ee Yy jn), analogous to (3). 
(7) tiVG =k) = (GVj=iVk). 
(8) id j.k=(G@Dj)-GD k). 
(ONE D N A R jam (EDG) € Dja) ow sue (ED ja). 
(Go) H DjVk=(GD)VGD hk). 
I }@ DG: VIV... Vin = GDIFNVEDF)V...VGED j). 
(2) }i GDR) =GDA|)D (iD h). 
(13) Hi D (G = k) =[GD j) =D dD). 
(1) kG@-7 24) =GDAVGD&). 


(2) }(éxetge. see ip DK = (iDVD ANV... NVG D k). 
(3) EVID k) = (iD k). GD 4). 
OPENEN Nan hl iDN (in D k). 


D (G =k) = G.j =i. k). 
. (1) fi =iV Gij). 


(2) }i =i. (i Vj). 

(3) Vj =iV G. ~i). 
(4) Hij =i. GV ~i). 
(5) bing =i 5). 
Disjunctions with ‘?’. 

(x) }iVe Ht 

(2) }i Vat =i. 
Conjunctions with ‘?’. 
(1) font =2. 

(2) fin ~t = ~t. 


. Conditionals with “. 


(x) }(#D 4) =i. 
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(2) (~D i) =t. 

(3) G34) =t. 

(4) FED ~i) = ~i. 
u. Biconditionals with ‘?’. 


(1) k@ =) Si. 
(2) FG = ~i) = ~i. 
B. Theorems on Truth-Tables 
The truth-table for n sentences i:, . . - , În has 2” lines (T40-31f). Each 
line may be represented by a conjunction of # components. Let these’ 
conjunctions be kr, +--+) Am (m= 2"). kris distas.. -e tai every other 


conjunction is formed from k, by replacing some components with their 
negations. For any sentence j constructed out of some of the z-sentences 
with the help of connectives, its truth-table can be constructed in the 
customary way; it states one of the truth values T or F for each of the 
m lines. The following diagram shows an example for n = 3, m = 8. If 
j has the value F for every line, it is L-false; otherwise j is L-equivalent, 
to the disjunction of those k-sentences for which it has the value T. (In 
the example given in the last column of the accompanying table, j is L- 
equivalent to k: V k V kų.) 


Truth-table for three components 
OSS E 


fi tf ty Conjunctions ji GN ~is) a is 
T T T | hithetets T 
Signs a ka : ir ha a ~is F 
TRUE hy tive wines T 
Toco ka ii a ~ia ais E 
a E l o kiwis ihsis, F 
BAST ks : vis " ine ~is F 
F F T hy wine ine ty T 
T ET ks : ~ir a iaa ~is F 


52 ion eS 


The assertions of the following theorem T7 are well known. They follow 
from the earlier mentioned theorems of propositional logic. 

T21-7. Let i;, iz + . » , in be any sentences. Let k, be their conjunction 
hii eoi eben es ee Ota 2") be formed from k: by replacing 
nents with their negations. Let j be any sentence 
constructed out of é-sentences with the help of connectives. Let K; be the 
class of those &-sentences which correspond to the lines of the éruth-table 
for which j has the value T. Then the following holds. 


one or more of the compo 
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. The m k-sentences are L-exclusive in pairs. 

. The m k-sentences are L-disjunct, that is, } kı V ka V.. . V km- 

. If &; is empty, j is L-false. 

. If R; is not empty, 7 is L-equivalent to the disjunction of the sen- 

tences of £;. 

e. If 8; contains all k-sentences, 7 is L-true. 

f. If &; is any non-empty class of k-sentences and h is a disjunction of 
the sentences in §;, then the truth-table for # has the value T on 
just those lines which correspond to the k-sentences in &;. 

g. If &; does not contain all the k-sentences, then j is L-equivalent to 

the conjunction of the negations of those k-sentences which do not 

belong to &;. 


If we take as the z-sentences all the atomic sentences in fy, then the 
k-sentences become (by rearranging their conjunctive components in the 
lexicographical order) the state-descriptions (3) in Qy; and, in the same 
way, &; becomes R;. Thus we obtain the following theorems about 3. 


+T21-8. For Ww. 

a. Any two distinct 3 are L-exclusive. (From T7a.) 

b. All the 3 in ly are L-disjunct, that is, their disjunction is L-true. 
(From T7b.) 

c. If j is not L-false, then j is L-equivalent to the disjunction of the 
3 in R;. (From T7d.) 

d. If Q; is any non-empty class of 3, and / is a disjunction of the 3 in 
Ri, then Ra is the same as &;. (From T7f.) 

e. For every class §;, finite or infinite, of sentences in Qy, there is a 
sentence 4 L-equivalent to &,. 


Proof. The number of atomic sentences, and hence that of all 3 in &y, is 
finite. Therefore, R(K+) is finite even if &+ is infinite. We take R(&:) to be 
non-empty; otherwise §x is L-false and hence ‘~? is L-equivalent to it. Let + 
be a disjunction of the 3 in 9(&z). Then Ra is the same as R(x) (d). Hence k 
is L-equivalent to R. 


ar op 


f. If j is not L-true, then 7 is L-equivalent to the conjunction of the 
negations of those 3 which do not belong to R;, in other words, 
those 3 which belong to R(~j). (From T7g.) 

g. The negations of any two distinct 3 are L-disjunct. 


Proof. | ~(3: « Bi) (a). Hence | ~3; V ~3;. 
C. Normal Forms 


D21-2. Let i be any sentence in £y or any nongeneral sentence in fo. 
jis a sentence of disjunctive normal form corresponding toi = ps j is formed 
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from i by applying the following rules (a) to (p), until none of them is 
k applicable any more; at any time the first of the rules which is applicable 
{ is to be applied; the rules apply to any part of the sentence in question 


(for example, (c) applies to any part which has the form of an identity 
sentence with two occurrences of the same in). 


a. (For Qy only.) The first quantifier together with its scope Mi; is re- 


mona 


placed (x) if the quantifier is universal, by the conjunction of the N 
instances of M;, (2) if the quantifier is existential, by the disjunction 
of the instances. 


. The sentence is expanded into primitive notation (that is, every de- 


fined expression occurring is eliminated with the help of its defini- 
tion). 


. in; = in; is replaced by ‘?. 

. in; = in, with two, different in is replaced by ‘~P. 

. ~~k is replaced by k. 

. Ifthe same sentence occurs more than once as a component in a con- 


junction or in a disjunction, then all occurrences except the last one 
are omitted. s 


. A disjunction containing a sentence and its negation as components 


is replaced by “’. 


. A conjunction containing a sentence and its negation as components 


is replaced by ‘~P. 


. Ifa disjunction contains ‘’ as a component, then the whole disjunc- 


tion is replaced by ‘?’. 
If a disjunction contains ‘~?’ as a component, then this component 
is omitted. 


. If a conjunction contains ‘t’ as a component; then this component 


is omitted. 


. If a conjunction contains ‘~# as a component, then the wholé con- 


junction is replaced by ‘~?’. 


. ~(k:V ka V... Vka) (#2 2) is replaced by wkis Skan.. 


~ka. 


. (hie ka=.» - « kn) (” 2 2) is replaced by ~k: V wk V... V Wh. 
. If a disjunction contains k, and k, as components and k, is a sub- 


conjunction of ka (Dxg-1), then ką is omitted. 


. A disjunction occurring as a component in a conjunction is dis- 


tributed (that is to say, he (kV ka V... V kn) -l with n 2 2 is 
replaced by (h « kı. I) V (kakas DV.. NV (ha knal) 


For the transformation into the conjunctive normal form, the rules (a) 
to (n) remain unchanged, but instead of (o) and (p) analogous rules are 


taken which refer to the dual forms. 
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T21-10. On normal forms. Let i be any sentence in y or any non- 
general sentence in l% ; let j be a sentence of disjunctive normal form cor- ` j 
responding to , and & a sentence of conjunctive normal form correspond- 
ing to i. 

a. i, j, and k are L-equivalent to each other. 

b. If i is nongeneral, then j does not contain any pr or in not occurring 

in 7; likewise k. 
c. j has one of the following forms: (1) ‘?’; (2) ‘~#’; (3) a basic sen- 
tence; (4) a conjunction of two or more basic sentences, in which 
no atomic sentence occurs together with its negation; (5) a disjunc- 
tion of two or more components of the forms (3) or (4). 
. k has one of the following forms: (1) ‘t’; (2) ‘~f; (3) a basic sen- 
tence; (4) a disjunction of two or more basic sentences, in which no í 
atomic sentence occurs together with its negation; (5) a conjunction i 
of two or more components of the forms (3) or (4). 


a 


T21-11. Let ż¿ and j be any factual nongeneral sentences (in a system £) 
which have no in in common. Then the following holds. 
a. i. j is factual. 


a 


Proof. 1. Let i; Viz... Vin (m = 1) be a sentence of disjunctive normal 
form corresponding to 4; analogously, jt Vj.V...Vjn (n = 1) for j. i. jis 
L-equivalent to GV e. Vin) e Ge V. . . Vja) (Trea), hence, by distribution 
(Tsm(3)), to (i »J:) V Gi: sja) V.. . V Gm = jn) with conjunctions for all pairs l 
of an ż-sentence and a j-sentence. Let i’ . j’ be an arbitrary one of these con- : 
junctions. i’ is a basic sentence or a conjunction of basic sentences (Tx0¢); 
likewise 7’; neither i’ nor j’ contains any atomic sentence together with its 
negation (Troc), and they have no in in common (Tiob). Therefore, i’. j’ 
is a conjunction of basic sentences which does not contain any atomic sen- j 
tence together with its negation. Hence, it is not L-false (T20-4g). Since this 
holds for every component of the disjunction, the whole disjunction is not L- 
false (T20-2q), and hence i. is not L-false. 2. Since neither i nor j is L-true, 
“i aj is not L-true (T20-2p). Hence, i «7 is factual. 


b. 7 V7 is factual. 


Proof. ~i and ~j are factual (T20-6a); hence likewise ~i « ~j (a), hence 
œ~(~i . ~j) (T20-6a), hence i Vj. ; 


c. j is not L-dependent upon i, that is, i L-implies neither j nor ~j. 
Proof. ~i is factual (T20-6a). Therefore not | i D j, since ~i Vj is factual 
(b). Likewise with ~j, since this is factual too (T20-6a). 
§ 22. Theorems on General Sentences 


Some theorems concerning general sentences (sentences with universal or 
existential quantifiers) are listed for later reference. : 
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The theorems in this section apply the L-concepts to general sentences; 
hence they belong to that part of deductive logic which is sometimes 
called lower functional logic or logic of quantification. 

722-1. Let i be a universal sentence (i,)(M,), and &; the class of the in- 
stances of M. 

a. i and Ñ; are L-equivalent. (From T18-rh and i.) 

b. i is L-true if and only if &; is L-true. (From (a).) 

c. iis L-true if and only if every instance of DM; is L-true. (From (b), 


T20-2a.) 
722-2. Let i be the sentence (iz:)(izs) - . - (izn) (My) (hence all variables 
occurring freely in M; are among the variables ir... , iin), and &; be 


the class of the instances of Mj. 

a. i and Ñ; are L-equivalent. (From Tra, by mathematical induction 
with respect to 7.) 

b. i is L-true if and only if &; is L-true. (From (a).) 

c. iis L-true if and only if every instance of I; is L-true. (From (b), 
T20-2a.) 

122-3. For gy. 

a. (i:)(M,) is L-equivalent in £y to any conjunction of the W instances 
of M; (in any order). (From T18-1g and h.) 

b. (tex) (ies) - - - (ian) (DY) is L-equivalent in fy to any conjunction of 
the instances of My (in any order). (From (a), by mathematical 
induction with respect to 1.) s 

c. (Aia) (D) is L-equivalent in fy to any disjunction of the N in- 
stances of Mt; (in any order). (From (a).) 

d. (Ain) (Eir) . . . (Birn) (M;) is L-equivalent in fy to any disjunc- 
tion of the instances of M; (in any order). (From (c), by mathe- 
matical induction with respect to n.) 


T3 shows that any variable can be eliminated, and hence any general 
sentence transformed into a nongeneral one. However, the result of the 
transformation is different in different systems (see § 1 5A). 


722-4. Let i be tautologous (D21-1) with respect to the components 
Jujas -+ Jn, and DN; be formed from 7 by replacing the j-components with 
any matrices (all occurrences of a component to be replaced by occur- 
rences of the same matrix). Then | ()(®2,). [‘()’ stands for a series of 
universal quantifiers with all variables occurring freely in Mj; see § 14.] 

Proof. Every instance of My is tautologous and hence L-true (T21-1). There- 
fore | ( )(M;) (T20). 
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722-5. Let Dt: be (i) (i) (M) D (AGAM). Then } ( (Mx). 


Proof. Let B: be an arbitrary 8 in l. We shall show that ( )(Mt:) holds in 8; 
hence it holds in every 3 and is therefore L-true. Let M; be formed from Dts 
by substituting any individual constants for all those variables other than i; 
and i; which may occur freely in Dt. 1. Suppose that every instance of M; holds 
in 8i. Then (i) (i) (Mg) holds in 3; (D18-4g twice), hence also (te) (4) (2k) 2) 
(is) (x) (MY); let this sentence be /. 2. Suppose that the condition (1) is not 
fulfilled. Then there is an instance of Mi which does not hold in 3;. Hence (i:) 
C) (Mi) does not hold in 3:. Therefore the negation of this sentence holds in 
Bi (T19-2), and hence } too. Thus, in any case, / holds in 8,, and likewise any 
other instance of M, and hence also ( )(®ti). 


722-6. Let M: be (i,)(M,; D Ms) D [(i)(M) d (i)(M)]. Then 
HOM). 


Proof. We prove the theorem (as in Ts) by showing that for any 8s, ( cms) 
holds in 3;. Let l be an arbitrary instance of Dt; thus lhas the form (i) (M; D 
ME) D [M D (Mi), where M; and Mg are formed from M; and 
Ma, respectively, by the same substitutions for all free variables except ts. 
1. Suppose that, for every instance of My which holds in Bs the corresponding 
instance (that with the same in substituted for i) of Mi holds likewise in 3. 
A. Suppose further that every instance of Mj holds in 8. Then every instance 
of Mi; holds likewise in 3;, hence also (mrs, hence also (i;)(M}) D (is) (M), 
hence also J. B. Suppose the condition (A) is not fulfilled. Then there is an in- 
stance of M; which does not hold in 3;. Hence, (W) does not hold in 
8i- Therefore, the negation of this sentence holds in 3; (T19-2), hence also 
(is) (Mi) D (&) (Mi), hence also 1. 2. Suppose the condition (1) is not ful- 
filled. Then there is an instance of M; such that it holds in 3; while the corre- 
sponding | instance of M; does not. Therefore, the corresponding instance of 
Mi D M; does not hold in 3;, and hence neither does C) (M; D Mz). There- 
fore, the negation of the latter sentence holds in 8;, and hence / too. Thus, } 
holds in 8; in any case, and likewise any other instance of tx, and hence 


()(@u). 


We omit proofs for the following theorems. Those for T7 and T8 can 
easily be constructed in a way similar to those given for T5 and T6. Our 
theorems T4, T5, T6, T7, and T8a, when applied to purely general sen- 
tences (that is, without in), correspond to Quine’s axioms of quantification 
([Math. Logic], p. 88, *100-*ro4). As the only rule of inference, he uses 

_the modus ponens (ibid., *105); to this rule, our T20-2 0 and c correspond. 

Therefore, every sentence provable (‘theorem’) in Quine’s system of quan- 
tification is L-true in our systems £: In this way we obtain all items of To 
and Tır if applied to purely general sentences (Quine [Math. Logic] 
§§ 17, 19, 20, 21). T8b and the other theorems applied to sentences with 
in are then obtained with the help of Trc. 


722-7. | ()[Mte D (i) (M:)], where i; does not occur freely in Ms. 
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722-8. + OLAM) D Mz], where Mi is formed from M, by sub- 
stituting for i; either (a) another i, or (b) an in. 


T22-9. 


e EOL) (Mx) D M. 

. ECOM D Mi) and | ()(M), then | ( (M). 
. IE (N(M), then } () (i) (D). 

. If t ( (Mrz 2) Mra), H O(Mra 2 Ms), L 


and ł ( Mzn: ta) Min), then t ( )(Dres 2) Deen). 


o TEHO (Mer = Mra), HO (Mra = Mis), - +. 


and } ( (Mk, n-z = Min), then + ( Mrz = Min). 


o FOL) = Mi) D (CM) = i) (M). 


OL) (iss) - - (tin) (DE) D Mil. 

If t O[M: D Me] and none of the variables ii... , fin Occurs 
freely in Mr, then | ()[Dts D (tix) (tea) - - « (tin) (M)]. 

If i; does not occur freely in Ms, + ()[Dts = (i) (D)]. 

FOGG) Me) = NNW]. 

+ [Ci (Me's Mi) = (i) (M) « (i;)(Mu)]. 

POLED) V GAM) D G) (Ds VM). 

If i; does not occur freely in Dx, 

OLEND » Mi) = Mr « (ix)(M)]. 


. If i; does not occur freely in Mr, 


LOLAM V M) = Me V GA (M). 


. If My is formed from M; by substituting į; for i; 


OLEM) = GAM). 


Tir deals with sentences written with- the existential quantifier 
(Ars-1c). $ 
T22-11. 


FOLGEN M) = (Ei) ~M)]. 


. OIE) (Me) = (i:)(~M)]- 


KOL lin) . <- (tin) (MG) = (Ei) - - - (Bin) (~M)]. 
t()[~Gis) . -. (tin) (Dt) = (is). - (tin) (~M)]. 


. OIM, D (Ai)(M4)], where M; is formed from Ds by substitut- 


ing for i, either (1) another i, or (2) an in. 
OLM: 2 (Ti) (Ma)]. 


FOG) M) 2 Ei) (W). 


If i; does not occur freely in Ms, t OM: = (Ei) (M)]. 
+ L(G) At) De) = (i) (Gis) (Me). 
FOLGWG)G) > ENM. 


. FOKT) (D V M) = (di) (Dt) V (Ei) (M)]. 
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L + (O[G)(@u V M) D (Ti(M) V (i) (MD). 

m. HOLM D M) d (Ti(M) d (Ti(M). 
n. HOLAM) (TiN) D (Ti(M . M). 

0. HOTEN D « Mi) D (Ti) (Mai) « (Ti) (M)]. 


§ 23. Theorems on Replacements 


Two expressions are called L-interchangeable if the replacement of the one 
by the other in any sentence i transforms i into an L-equivalent sentence (D1). 
Some theorems on replacements and L-interchangeability are listed for later 
use. The one most frequently used is this: L-equivalent sentences (or matrices, 
or predicate expressions) are L-interchangeable (Trb, Tac). 


We use the term ‘replacement’ in the widest sense, for the procedure of 
deleting one expression and putting another one in its place. Hence, if 
we speak of the replacement of some occurrences of an expression M; in 
a sentence j, we mean the replacement of one or several such occurrences 
and not necessarily of all occurrences of M; in j. (On the other hand, the 
term ‘substitution’ is always used in the following special sense: a vari- 
able, hence here in the systems £ an i, is replaced at all places where it 
occurs freely within the context in question by the same expression, here 
an i or an in.) Two expressions which can always be replaced by each 
other without thereby changing the logical content or meaning of the 
sentence (in other words, the proposition expressed by it) are called L-in- 
terchangeable (Dr). 


D23-1. N; is L-interchangeable with YU; in l = ps if i and j are any sen- 
tences in £ such that j is formed from i by replacing one occurrence of 
A; with A; or vice versa, then }i = Jj; and there is at least one pair of 
sentences 7, 7 of this kind. 


The theorems Tr and T2 are important and well-known theorems con- 
cerning the L-interchangeability of sentences and matrices, respectively. 
(For proofs see, for instance, Quine [Math. Logic] § 18.) 


+T23-1. Let j and 7’ be any sentences in £, and 7’ be formed from i by 
replacing some occurrences of J with 7’, then the following holds. 
a (j=j) 3 G=7). 
b. If tj = j’, then i = 7’; in other words, L-equivalent sentences are 
L-interchangeable. 
Trb has been utilized in T21-5C. 
T2 is analogous to Tr but more general, concerning any matrices. | 


+T23-2. Let M; and Mj be any matrices in &. Let M? be formed from 
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M: by replacing some occurrences of M; with W, and?’ from i in the same 
way. Then the following holds. 

a. OLOM; = M) D (M: = M). 

b. t(D; = M) D GF =?”). 

c. If + ()(M; = M), then Fi =7’; in other words, if two matrices 
(or predicate expressions) are L-equivalent (D25-1g), then they are 
L-interchangeable. 

Tac has been utilized in T21-5E. 

The following theorems are consequences of Tr and T2. They concern 
replacements by ‘’ and ‘~#, T4 is useful in deductive logic for the sim- 
plification of sentences and, in particular, for the transformation into a 
normal form (see, for instance, D21-2). Ts and T6, together with T4, 
will be used in inductive logic; there, not only L-true sentences but others 
similar to them, which we shall call almost L-true, will be replaced by ‘? 
for certain purposes. 

T23-4, 

a. If} 7, ¿i is L-interchangeable with ‘t’. 

Proof. If | i, then } i = ¢ (T21-5u(1)). Hence theorem from Tıb. 

b. If} ~ iż, tis L-interchangeable with ‘~/’. (From T21-5u(2), like (a).) 

c. If} ()(M), M; is L-interchangeable with ‘’. 

Proof. | ()[(Ms = #) = Ds] (T21-5u(1)D). Hence } ()(M% = 4) = ()(mM,) 
(T22-9f). Therefore, if } ( )(M;), then | ()(M = 4). Hence theorem from Tze. 

d. If} ()(~M;), M; is L-interchangeable with ‘~t’. (From T21-su(2), 
like (c).) 

T23-5. Let j be any sentence in |, and z’ be formed from 7 by replacing 

some occurrences of j with ‘#’. Then j D (i = 2’). (From Tia, T21-5u(1).) 

723-6. Let j be any sentence in l, and z’ be formed from ¢ by replacing 
some occurrences of j with ‘~t’. Then the following holds. 

a. Hj V (i = i’). (From Tra, T21-5u(2).) 

b. kj Vi =7 Vi. (From (a), T21-5m(7).) 


§ 24. Theorems on Identity 
Some theorems on identity are listed for later reference. ‘a = a’ is L-true 
(Tra), as customary; in our systems, ‘a = b’ is L-false (Txb). 


This section contains theorems on identity. They are chiefly based on 
T18-2b, c, which in turn is based on D18-4. We remember that these rules 
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? have been laid down in such a way that ‘a = a’ is L-true, while ‘a = b’ 
is L-false. 


+T24-1. 
a. Hin; = in;. (From T18-zb.) 
b. | in; = in, where in; and in, are two distinct in. (From T18-1c.) 


T24-2. 

a. | (ix)(ix = ix). (From T22-1¢, Tra.) 

b. | (is)(ix = in,), that is, | ~ (i)[~(is = in,)]. (From Tra, T22- 
rre.) 

Cc. Hint = i), that is, tei) ~ (id [~ (te = i)). (From (b), 
T22-1C.) $ 


T24-3. Let M be i: = i; D (M: D M;) where My is formed from M; 
by substituting i; for i;. Then | ()(M,). 

Proof. Let k be any instance of M+. Then k has the form in; = in; D (i Dj), 

where j differs from i, if at all, by containing in; at some places where i has ins. 

1. If in; is the same as in, then 7 is the same as j, hence |i D j (T2r-3c), hence 

| k (T20-2g). 2. If in; is not the same as inj, } ~ (in; = in,) (Trb), and hence 

}k (T20-2h). Thus } & in any case. Since this holds for every instance of M+, 
FO(WM) (T22-20). 


The following theorem T4 differs from the other theorems by holding 
only for certain systems. - 


124-4, Let ix, Un ti2,..., tim be m + 1 different i. 
a. Form 2 1, the following holds for 2» and for every Qy with N > m: 
bOGinlic Æ tre is ¥ ine... ste £ iim], that is, 
FO ~ (lt = in Vie = ia Ve. Vie = tind 
Proof. Let j be an instance of (Ai)[. .. ]. Then j contains at most m in. 
Therefore, in any of the systems mentioned, there is an ing which is different 
from all in in j. Hence, if k is constructed from the scope in j by substituting 


ing for ix, every conjunctive component in k is L-true (Txb), and hence | &, 
and hence } j. Therefore the sentence described is L-true (T22-2c). 


b. For m 2 o, n 2 o, m+n = 1, the following holds for Qe and for 
every {ty with V > m + n. Let inin ini, ..., iti, be some in (not 
necessarily distinct) in the system in question. 

Oili Æ ijro dk Æ ijan. o a te X times £ Mir a ir £ Hiss... 
a ix # itin], that is,  () ~(idliz = ie Ve. Vie = tm Vie = ina 
V... Vie = ina]. (From (a).) 


The theorems T4a and b can easily be made plausible, since the sen- 
tence described in (a) (or in (b)) says that the number of individuals in 
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the domain in question (in (b): except those designated by the in occur- 
ring) is greater than m. 

T2a and T3 correspond to the axioms of identity in the system of Hil- 
bert and’ Bernays ({Grundlagen], I, 165, formulas (J,) and (J+); these 
formulas contain free variables, our theorems refer to the corresponding 
closed forms). Therefore, the sentences corresponding to the formulas 
proved by these authors on the basis of the two axioms are L-true in our 
systems £. This yields the following theorems T6, T7a, and T8a (see Hil- 
bert and Bernays, op. cit., formulas 1, 2, 3, 4, 5, 6a, 10a). 


T24-6, 

a. H(i = uD (t= iD ty = i). 

b. KO Lis = D y= il. 

Cc. HOi = i; s] (i; =it0i= i,)]. 

d. Oli = tk D (i; = tk h») ti =z i]. 

e. Oli i Dixit. Vis i. 

T24-7. Let M, and M; be formed from M; by substituting i, and inz, 
respectively, for i;. 

a. FOr is V M) = M. 

b. OLN: < ins V M) = Mi]. (From (a).) 

The following is a rather special and complicated theorem that we shall 
need later (in Vol. II) for an important theorem in inductive logic. 

T24-8. Let M, be a molecular matrix (D16-3e) without in and with iz 
as the only free variable, such that all instances of it are factual. Let ir, 
irn tea). . , ten be n + 1 distinct variables (n = 1), Let Mt; be (ts) [iz = tay 
Vic = iV.. V ik = im V MJ]. Let Ma be the disjunction M, V 
Maz V... V Mangi Whose # + 1 components are as follows. Ma: is the 
sentence (i,)(Mtz). For every m from 2to n + 1, Dim is ()[ij: = ija V ifs = ij 

V ism = tim V Mu VM. V...V Mimi» [~M* V...V 

~MA,]. Here ij:, ijz ...,tim are m distinct variables, and the scope de- 
scribed contains for any two distinct ones of these variables the identity 
matrix as a disjunctive component, and further the matrices Mu, ..., 
Mim, Which are formed from Wt, by substituting for t+ one of the variables 
ij, ..., Ìm in turn. The matrices M%,, etc., are formed as follows. They 
correspond to the s subclasses containing m — 1 of the n variables ir, 
.. +, te, mentioned before. [The number of these subclasses is (,,” ,) 
(T40-32d); hence this is also the number s of the matrices Ntm., etc., to 
be described.] For m = 2 there are n subclasses containing one each of 
the » variables mentioned; here, the n matrices are formed from M by 
substituting for i, one of the variables in turn. For m > 2 the matrix Din, 
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(for p = 1 to s) is a conjunction containing as components the matrices 
formed from M, by substituting for i, one of the variables of the pth 
subclass in turn and furthermore the negations of the identity matrices 
for any two distinct variables of the pth subclass. For m = n + 1 the 
one subclass is the class of all those variables itself; here we have only 
one matrix, namely, I2%,, formed as just described but for all variables. 


a. HOM: = Ma). 


For a proof see Hilbert and Bernays, op. cit., p. 175, formula 10a. The theo- 
rem can be made plausible as follows. M says: ‘For every x, if x z£ z, and x ¥ z, 
and... and x # Z, then Mx’. M, says: ‘Either (1) all x are M, or (2) all x 
with at most one exception are M, and z; or g, or .. . or Za is not M, or (3) all x 
with at most two exceptions are M and either (a) z, and z, are not M and are 
distinct from each other or (b) z and z; are... or (c) ... or (.) Za: and Zn 
are not M and are distinct from each other, or (4) . . . , or (n + 1) all x with 
at most exceptions are M, and z, and z, and . . . and za are not M and are 
distinct from one another’. It is easily seen that Mt, says the same as Dt. 


b. Let M; be formed from M; by substituting n’ distinct in for n’ out 
of the n free variables ix:, . . . , in (#’ < n). Let Mj, be formed from 
Mh, by the same substitutions. Then } ( )(Mt; = M4). (From (a).) 


§ 25. On Predicate Expressions and Divisions 


We admit as abbreviations molecular predicate expressions constructed 
from predicates with the help of connectives (Az) and, further, molecular predi- 
cates as abbreviations for such expressions. Among others, the following terms 
are defined as applied to matrices or predicate expressions or the corresponding 

- attributes (properties or relations): ‘universal’ and ‘L-universal’, ‘(L-)empty’, 
‘factual’, ‘L-implication’, ‘L-equivalence’ (Dr); further, from the theory of 
relations, ‘(L-)reflexive’, ‘(L-)symmetric’, etc. (D2). A set of molecular predi- 
cates is called a division if they divide all individuals without overlapping (D4). 


The third and largest part of this chapter (§§ 25-38) deals with some 
selected topics of deductive logic. Most of the definitions and theorems 
in this part are new. A number of them, although of interest for deductive 
logic, have not found sufficient attention so far. However, our chief reason 
for developing these special parts of deductive logic here is their useful- 
ness for our theory of inductive logic. Whenever any section of this part 
becomes relevant in later chapters, it will be indicated there. 

It will be convenient to permit as abbreviations compound predicate 
expressions, constructed from predicates with the help of connectives. 
We call them, together with the primitive predicates, molecular predicate 
expressions. 


A25-1. We shall use molecular predicate expressions as unofficial ab- 
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breviations in the following way. At the place of a pr as a component, any 
molecular predicate expression is likewise allowed. A: is any suitable 
argument expression consisting of one or more individual signs (for ex- 
ample, ‘axb’ for a predicate expression of degree three). 

a. (~pr;)A; for ~(pr,2,). 

b. (pr, V pri); for pr Mt; V prs. 

(A (pr; . pra): for pr; a pra. 

d. (pr; D prs)M; for prjM: D pras. 

e. (pr; = pra)A; for prM; = prai. 

Thus, for example, ‘(P, V P,)z’ is short for ‘P,x V P x’; (P; . ~P,) V 
~(P;.P,)Je’ is short for ‘(P,a . ~P,a) V ~(P,a.P,a)’. Note that a 
molecular matrix can be abbreviated in this way only if all atomic 
matrices occurring in it have the same argument expression; thus, for 
example, ‘P,a V P,b’ cannot be abbreviated. 


By a molecular predicate we mean a predicate that either is primitive 
or is introduced as an abbreviation for a molecular predicate expression. 
We shall usually take ‘M’, ‘M”, etc., ‘M,’, ‘M,’, etc., as molecular predi- 
cates to be defined from case to case, or also used without specifying 
their definitions; further, ‘Q,’, ‘Q,’, etc., for certain molecular predicates 
of a special kind to be defined later (§ 31). [For instance, ‘M,’ may be 
defined by ‘P, . ~P}; then, ‘M,a’ would be short for ‘(P,.~P,)a’, 
hence for ‘P,a . ~P,a’.] By a molecular attribute (property or relation) 
with respect to a system £ we mean an attribute designated by a molecu- 
lar predicate expression (and hence designatable by a molecular predicate 
if we care to introduce one). 

By a full matrix of a predicate expression A; (which may be a primitive 
or defined predicate or a compound molecular predicate expression) of 
degree n we mean a matrix of the form ,%;,%j....Wjn, where the 
Wiz, etc., are individual signs; if all of the latter are individual constants, 
and hence the whole is a sentence, we call it a full sentence of Y;. 

Each of the items Dra, b, d, e and Daa, b, c, d, e, f, g is a condensed 
formulation of two definitions, one to be read without the two prefixes 
‘L-’ included in square brackets, the other to be read with both of them; 
thus, for instance, Dra says (1) M; is universal =p, ( )(M;) is true, and 
(2) M; is L-universal = ps ()(M;) is L-true. 

+D25-1. 

a. M; is [Z-Juniversal = ps the sentence ( )(M;) is [L-]true. 

b. M: is [L-Jempty =p: ()(~M) is [L-]true. 

c. Mi; is factual =p; M is neither L-universal nor L-empty. 
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d. M: and My are [L-lexclusive = ps ()[~(M; « M;)] is [L-]true. 
ee Miz, Min- - - , Min are [L-]disjunct = pt (fMi V MV... V Mind 
is [L-]true. 
f. M: L-implies M; = ns + ()(M: D Ms). 
g. M; is L-equivalent to M; = ns t ()(M; = Mi). 
In D2 some of the customary concepts in the theory of relations are 
defined, together with the corresponding semantical L-concepts. 


D25-2. Let M; be a matrix with i, and i; as the only free variables. Let 
M;: be formed from M; by substituting i; for i;, and Mi, by substituting 
ix for ij. Let Mj; be formed from M; by simultaneous substitutions of i; 
for i; and i; for i; likewise M; by simultaneous substitutions of i; for i; 
and i, for i;. 

. Mi; is [L-]reflexive = ps the sentence (i;)(M) is [L-]true. 

. Mi; is [L-lirreflexive = ps (i)(~Mw) is [L-]true. 

© Ma is [L-lsymmetric = ps ()(Miy D My) is [L-]true. 

« My is [L-]asymmetric = nt ()(Miy D ~My) is [L-]true. 

. My is [L-]transitive = ps ()(Mi; «Myx D Mi) is [L-]true. 

. WM is [L-]intransitive = ps (,)( Mij « Mjr D ~Mix) is [L-]true. 

. My is [L-lone-one =p the sentence ()(My . Mir D iy = te) 
. ( (Mir . Mjr Dis i) is [L-]true. 

All the terms defined in Dz and D2 will be used in the following three 
ways. Each of these terms may be applied 

(A) to a matrix, as formulated in the definition; 

(B) to a predicate expression A; (that is, to a primitive « or defined 
predicate or a molecular predicate expression (A1)) of degree n, 
if the term applies, according to the definition, to the matrix 
Wisi. ... i, formed with the alphabetically first n variables in 
the alphabetical order; 

(C) to the corresponding attribute (namely, the one designated by the 
predicate expression in (B) and hence expressed by the matrix 
in (A)). , 

For example, if ‘M’ is defined by ‘P, . ~P}? and ‘(x)(~Mzx)’ is true, 
then we shall say of each of the following entities that it is empty: (A) the 
matrix ‘Mx’ and its expansion ‘P,« . ~P,«’, (B) the molecular predicate 
‘M’ and the molecular predicate expression ‘P,;.~P,’, and (C) the 
molecular property M, that is, the property of being P, but not P3. 

The definitions of the L-terms in D1 and D2 state characteristic sen- 
tences for each case in such a way that the L-term holds in the case in 
question if and only if the characteristic sentence is L-true. Now D3 says 


maroenaoso fp 
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that for any sentence e (usually a non-L-false sentence serving as evi- 
dence for a degree of confirmation), the L-term with the addition ‘with 
respect to e’ holds if and only if the characteristic sentence j is L-implied 
by e, hence if and only if } e D j. (This is analogous to D20-2.) 


D25-3. 

a. MN; is L-universal with respect to e =p: te D ()(Mi). 

b. Analogously for the L-terms defined in Drb, d, e, f, g, and Dza, b, 

c, d, e, f, g. 

725-1. Let A; and A; be two molecular predicate expressions of de- 
gree one and A; and A; molecular predicates defined by A; and Af, re- 
spectively. Let 4, i’, j, and 7’ be the full sentences with in; of A;, Ai, Ws, , 
and 9}, respectively. 

a. A;, and hence YW; is L-universal if and only if 7, and hence J, is L-true. 

Proof. x. Suppose M; is L-universal. Then | (ix)(.ix) (Dra). Therefore, 
fi (T22-1c). 2. Suppose fi, i.e.; | Ains. Then, for every ins, | Asina because A; 
does not contain any in (this follows from a well-known theorem which will be 
stated in the next section; see T26-2b). Therefore, } (ix) (itz) (T22-1c). Hence, 
A; is L-universal (Dra). 

b. A, and hence 9%, is L-empty if and only if 7, and hence j, is L-false. 
(Analogous to (a).) 

c. A;, and hence Y, is factual if and only if 7, and hence j, is factual. 
(From (a), (b).) 

d. A; and A} are L-exclusive of each other (and hence likewise M; and 
QW) if and only if i and ï are L-exclusive (and hence likewise j 
and j’). (Analogous to (a).) 

e. MN; L-implies A (and hence A; L-implies 4) if and only if }7 D i 
(and hence }j D 7’). (Analogous to (a).) 

f. M; and Ai are L-equivalent to each other (and hence likewise M; and 
4) if and only if }¢ = 2’ (and hence }j = j’). (Analogous to (a).) 

The following definition is, for the sake of simplicity, formulated with 
respect to molecular predicates. However, it may as well be applied to 
the corresponding molecular predicate expressions, matrices in primitive 
notation, and properties designated, hence in all the ways (A), (B), and 
(C) explained above. 

+D25-4, Let ‘My’, ‘M,’,...,‘M,’ be p molecular predicates of de- 
gree one (p = 2). These predicates (or the properties designated by 
them) form a division =p: the following three conditions are fulfilled. 

a. The predicates are L-disjunct (Dre), that is, | (x)(MixV Max V... 


V Mx). 
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b. Any two distinct predicates are L-exclusive (Did); that is, if ‘Mn’ 
and ‘M,’ are distinct, | (x) ~ (Mnx «M,«). 

c. None of the predicates is L-empty (Dib), that is, for no m, 
t (x) ~M 2. 


Hence a division divides all individuals of the system in question into 
kinds or properties which are (a) exhaustive and (b) nonoverlapping, 
Consequently, every one of these individuals must belong to exactly one 
of the p kinds. Condition (c) says that none of the properties is impossible; 
this does, however, not exclude the case that some of them happen to be 
empty. 

T25-2. Every predicate in a division is factual; hence any full sentence 
of such a predicate is factual (Tıc). 


125-3. Dichotomous division. If ‘M; and ‘M,’ form a division, ‘M? is 
L-equivalent to ‘~M,’. (From Daa, b, T20-2m.) 

Thus, in this case, the individuals are simply divided into those which 
are M, and the rest. 


725-4, With respect~to a given division consisting of p predicates 
(p > 2), let ‘M’ be defined by a disjunction of n of those predicates 
(t S n < p — 1), and ‘M” be defined by the disjunction of the remain- 
ing p — n predicates. Then the following holds. 

a, ‘M’ and ‘M” form a division. 

b. ‘M” is L-equivalent to ‘~M’. (From (a), T3.) 


§ 26. Isomorphic Sentences; Individual and Statistical Distributions 


A. A one-one correlation among the in of a system & is called an in-correla- 
tion (Dr). If a sentence i is transformed into 7 by replacing all in with their 
correlates with respect to any in-correlation, i andj are called isomorphic (D3a). 
Although isomorphic sentences are in general not L-equivalent, nevertheless 
they share the L-properties (T2); e.g., if one is L-true, the other is likewise. 
B. An individual distribution (D6a) is a conjunction of full sentences of predi- 
cates of a division with different in. A statistical distribution (D6c) is a disjunc- 
tion of individual distributions which are mutually isomorphic and hence as- 
sign to the kinds the same numbers of individuals. 


A. Individual Correlations and Isomorphism 


In this section we shall deal with correlations among individual con- 
stants and with transformations of sentences with the help of these corre- 
lations. (The term ‘correlation’ is here used, as is customary in logic, in 
the sense of ‘correspondence’ or ‘one-one relation’, not in its statistical 
sense.) These transformations will be of fundamental importance in our , 
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system of inductive logic. They are also important for deductive logic, as 
is shown by the fact that the L-concepts are invariant with respect to these 
transformations (T2). These invariances in deductive logic have so far 
been studied only rarely (cf. A. Lindenbaum and A. Tarski, “Ueber die 
Beschranktheit der Ausdrucksmittel deduktiver Theorien”, Ergebnisse 
eines mathematischen Kolloquiums, Heft 7 [1936], pp. 15-23, and F. I. 
Mautner, “An Extension of Klein’s Erlanger Program: Logic as Invariant- 
Theory”, Amer. Journal of Math., 68 [1946], 345-84). 


+D26-1. C is (a correlation of the in, or briefly) an in-correlation 
in £ =p; C is a one-one relation whose domain as well as its converse do- 
main is the class of all in in &. . 


In the following definition D2, (b) will mostly be applied to sentences, 
and (c) to classes of sentences. The same holds for the later Dga and b, 


respectively. 


D26-2, Let C be an in-correlation in £. 

a. in; is the C-correlate of in; (in signs of the metalanguage: in; is 
C(in,)) = ps in; is that individual constant which is correlated with 
in; by C. 

b. The expression MA; is the C-correlate of the expression A; (A; is 
C(4,)) =e AW; is an expression in g, and A; is formed from A; by re- 
placing every in occurring in Y; with its C-correlate (in the sense (a)). 

c. The class of expressions &; is the C-correlate of the class of expres- 
sions Rj (R: is C(®j)) =p: Li is a class of expressions in 2, and §; is 
the class of the C-correlates (in the sense (b)) of the expressions be- 
longing to Ñy. 

In order to discuss examples, we sometimes describe an in-correlation 
in a form like ‘(7%)’; we write on the upper line all in of the system in 
question (in the example, of &,) and underneath each in we write its 
correlate. 

726-1. The number of different in-correlations in y is W/. (From 
T40-20b.) 

+D26-3. 

a. Let M; and A; be expressions in 2. A; is (in-isomorphic, or briefly) 
isomorphic to A; = ps there is an in-correlation C such that either 
A; is C(A,) or, if A; is a conjunction, A; differs from C(M;) at most 
in the order of the conjunctive components. 

b. Let Q; and §; be classes of expressions in €. &; is (in-isomorphic, or 
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briefly) isomorphic to ®; = ns there is an in-correlation C such that 
Ri is C (&). 
Isomorphism is obviously reflexive, symmetric, and transitive (D25-2). 
In D3a we admit a change in the order of conjunctive components for 
the following reason. We shall often apply the concept of isomorphism 
to 3. 3; in Qy is a conjunction. C(,) is in general not a 3; we have re- 
quired for the 3 the lexicographical order of the conjunctive components 
(Dx8-ra), and this order is in general disturbed by the application of C. 
If we rearrange the components in C(8,) in their lexicographical order, 
we obtain again a 8, say, 3,; this suggests the subsequent definition 
D4b. Then, according to Dga, 3; and 8, are isomorphic. 


D26-4. Let C be an in-correlation in £. 3; is constructed from 3: by C = vi 

(a) (in Lx) 3; is C(8:); 

(b) (in £x) 3; is formed from C(3,) by arranging the conjunctive com- 
ponents in lexicographical order. 


+T26-2. Invariance of L-concepts with respect to in-correlations. Let 
C be an in-correlation in £, i and j be sentences in &, i’ be C(i), and j’ 
be C(j). 
a. NG’) is the class of those 3 which are constructed from the 3 in R: 
by C (D4). 

Proof. x» If i is an atomic sentence, an identity sentence, or ‘f’, the asser- 
tion follows from Tx8-1a, b, c, d. 2. If the assertion holds for j and for k, it 
holds likewise for ~j, j V k, and j « k (T18-1e, f, g). 3. Let i be (i) (M;). If the 
assertion holds for every instance of My, it holds also for i (T18-1h). 4. Every 
sentence can be constructed from sentences (whose number may be infinite) of 
the forms mentioned in (1) by a finite number » of steps of the four kinds men- 


tioned in (2) and (3). Hence the assertion follows by mathematical induction 
with respect to n. 


b. | 2 if and only if } i’. (From D2o-1a, (a).) 

c. iis L-false if and only if 7’ is L-false. (From T20-1a, (b).) 

d. L-implication (or L-equivalence, L-disjunctness, L-exclusion, re- 
spectively) holds for 7 andj if and only if the same relation holds for 
i’ and j’. (From T2o0-1b, c, d, e, (b).) 

e. iis factual if and only if 7’ is factual. (From (b), (c).) 


Note that Tab does not assert that i and 7’, i.e., C(é), are L-equivalent. 
If i is L-true, i’ is likewise L-true, and hence, in this case, i and 7’ are 
L-equivalent. However, if 7 is factual, 2’ is, in general, not L-equivalent 
to i. [For example, let C be (È). ‘Pa’ is of course not L-equivalent to 
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‘Pb’. On the other hand, since ‘Pa V ~Pa’ is L-true, so is ‘Pb V ~Pb’.] 
The following is a corollary of T2b. 


726-3. Let } be a purely general sentence (D16-6h) in &. 
a. If i and ĵ are isomorphic sentences and }7 D /, then j D 1. 

Proof. Let the conditions be fulfilled. Then there is an in-correlation C such 
that j differs from C(i) at most in the order of conjunctive components and 
hence is L-equivalent to C(i). Since / does not contain an in (D16-6h), C(J) is 1. 
Therefore, C(i D 1) is CG) D 1, and thus is L-equivalent toj D J. Hence theo- 
rem from T2b. 


b. If 3; and 3; are isomorphic 3 in £ and / holds in 3;, then / holds 
likewise in 3;. (From T20-2t, (a).) 


B. Individual and Statistical Distributions 


Many of the most important inductive inferences which we shall dis- 
cuss later are statistical inferences, that is, some of the sentences involved 
speak about frequencies. Therefore we shall often make use of the con- 
cepts of individual and statistical distributions now to be defined. 


+-D26-6, Let p molecular predicates ‘Mm’ (m = x to $) be given which 
form a division in  (D25-4), and n in in £ (n finite 21). 

a. iis (an individually specified description of a distribution, or briefly) 
an individual distribution for the n given in with respect to the 
given division (in £) =pr is a conjunction of n full sentences of 
predicates of the given division with one each of the n given in, 
these » conjunctive components being arranged in lexicographical 
order. 

b. j is the statistical distribution corresponding to i (in £) = ns 7 is an in- 
dividual distribution for the » given in with respect to the given 
division, and j is the disjunction of all those individual distributions 
for the same in with respect to the same division which are isomor- 
phic to i (including i itself), the disjunctive components being ar- 
ranged in lexicographical order. 

] c. j is (a statistical description of a distribution, or briefly) a statistical 
distribution for the n given in with respect to the given division 

| (in £) =p, there is an individual distribution 7 for the given in with 
| respect to the given division, and 7 is the statistical distribution cor- 

responding to i (in the sense (b)). 

According to our earlier explanation of conjunctions and disjunctions 
with » components (§ 16), an individual distribution for one in is a full 
} sentence with that in. And analogously, if there is no other individual 
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_ distribution isomorphic to 3, the statistical distribution corresponding to 

t is 7 itself. 

The requirement of the lexicographical order in D6a has merely the 
purpose to make the form of an individual distribution unique, that is to 
“say, to make sure that there cannot be two distinct L-equivalent individu- 
al distributions. In other words, every possibility for distributing the given 
n individuals among the m properties of the given division is represented 
by one and only one of those sentences which we call individual distribu- 
tions. The reason for the requirement of the lexicographical order in D6b 
is analogous. 

The meanings of the terms defined in D6 will become clearer by some 
examples. Suppose that ‘M,’, ‘ 2’, ‘My, ‘Mj are defined as molecular 
predicates which constitute a division; that the order in which they 
have just been given is the lexicographical order of their expansions and 
hence also of their full sentences with any in. Let i, be ‘M,a.M,e. M,)’. 
Then, according to Dé6a, i, is called an individual distribution for ‘a’, 
‘b’, ‘c’ with respect to the given division. The term seems natural because i; 
specifies for each of the three individuals a, b, c, to which of the four kinds 
in the division it belongs. Let C’ be the in-correlation ($%) in Q, (or, in a 
larger system, any in-correlation beginning in this way). C’(i;) is ‘M,b. 
M,c . Ma’; let us call this sentence ia. Hence i; is isomorphic to 7,;. On 
the other hand, let C be the in-correlation (3s). CG) is ‘Myc. Mya. 
M.b’. Thus, this sentence is likewise isomorphic to 7,. However, it is not 
an individual distribution. In order to transform it into one, we have to 
rearrange the components in lexicographical order. Then we obtain 
‘M,a. Mca Mib’; but this is 7, itself. This shows that sometimes the 
application of an in-correlation, although it is not the identity-correla- 
tion, does not yield a new individual distribution. There are only three 
individual distributions isomorphic to i,: (1) ,; (2) ia; (3) ‘M,a.M,b. 
Mc’, which we call i,. Each of these three sentences has first two full sen- 
tences of ‘M, and then one of ‘M,’; hence they agree in the numbers of 
individuals assigned to the four kinds and differ only with respect to the 
individuals themselves. 

Let j, be the disjunction of the three sentences in lexicographical order, 
that is, 7, Vi, V ix. Then, according to D6b, jx is the statistical distribu- 
tion corresponding to the individual distribution i,, and also that corre- 
sponding to i, and that corresponding to i;. i,, i,, and i; represent the 
three possibilities for distributing the individuals a, b, c in such a way 
among the four kinds in the given division that two individuals belong 
to M; and one to M, and none to the other kinds, Therefore, what is ex- 


§ 26. ISOMORPHIC SENTENCES; DISTRIBUTIONS 113 


pressed by the disjunction j, is neither more nor less than the way of dis- 
tribution just described; hence j, says that, of the individuals a, b, c, two 
belong to M, and one to M, and hence none to M, and M,. Thus Ga," 
distinction to é, or ż, or i;, does not specify which individuals among a, b, c 
belong to each of the four kinds, but only how many of them do; it states* 
that the numbers of the given individuals belonging to the four kinds are 
O, O, 2, 1, respectively. This is the reason why we call j, a statistical distri- 
bution, in distinction to the individual distributions i+, ta, 13. 

Suppose that z is an individual distribution containing m, full sentences 
of ‘M,’, na of ‘M,’,...,m, of ‘M,’, and 7 is the statistical distribution 
corresponding to 7. Then we shall sometimes say that z is an individual 
distribution and ja statistical distribution with respect to‘M,’,..., ‘M; 
with the cardinal numbers n, ..., Ny. 3 


T26-5. For a given division and for # given in, let 7 and k be individual 
distributions and 7 the statistical distribution corresponding to i. 
a. k is factual. 
Proof. The full sentences of the molecular predicates in & are not L-true 


(T25-2). No two of them have an in in common. Therefore, & is factual (from 
T21-11a by mathematical induction with respect to n). 


b. If i and k are distinct (i.e., not the same sentence) then they are 
L-exclusive, hence }k D ~i. J 


Proof. If i and k contained the same conjunctive components, they would be 
the same sentence because of the requirement of lexicographical order in D6a. 
Therefore, there must be at least one in which occurs in 7 in a full sentence of a 
molecular predicate different from that with which it occurs in k. These two 
full sentences are L-exclusive (D25-4b), and hence likewise i and k. 


c. }i D j. (From Déb.) : 
If ïj is not the statistical distribution corresponding to k, then 
tk D ~j. 

Proof. Let the condition be fulfilled. Then must be distinct from i, and 
hence } & D ~i (b), and likewise with any other individual distribution to which 
j cortesponds, say, i’, i”, etc. Therefore, k D ~i. œi ami" a... (T21- 
sm(9)); hence Fk D ~ (Vi Vi” V...) (T21-5f(2)), hence k D ~j. 


pi 


e. If} k D 7, then j is the statistical distribution corresponding to k. 


Proof. Suppose } k D j. Then not | & D ~j, because otherwise } k D j . ~j 
(T21-5m(8)) and hence | ~k, which is not the case (a). Therefore, 7 is the 
statistical distribution corresponding to k (d). 


f. The following three conditions are logically equivalent to one an- 
other (i.e., if one of them holds, the others hold also): 
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(1) 7 is the statistical distribution corresponding to k, 
(2) k is isomorphic to i, 
(3) & L-implies 7. 

(From Déa, (c), (e).) 

It is easily seen that any statistical distribution for all the V in in Qy 
can be transformed into an L-equivalent, purely general sentence. The 
transformation is analogous to that of a structure-description which will 
be explained in the next section. 


§ 27. Structure-Descriptions (Str) 


All those 3 which are isomorphic to 3; in Qy ascribe the same structural 
features to the primitive attributes of &y. Therefore, we call the disjunction of 
these 3 (in a certain order) a structure-description (Str) (D1). This concept will 
play an important role in the later definition of degree of confirmation. 


The most important use of the concept of ismorphism among sentences 
is its application to the 3 in finite systems ly. Let us consider an example 
in a system &, with the three in ‘a’, ‘b’, ‘c’, and with only two primitive 
predicates, ‘P’ of degree one and ‘R’ of degree two. This system contains 
twelve atomic sentences, three with ‘P’ and nine with ‘R’. The following 
conjunction of twelve basic sentences is an example of a state-description, 
which we will call 3:: ‘Pa. Pc. Rab. Rbc. Reb. ~Pb.~Raa.~Rac« 
~Rba. ~Rbb. ~Rea. ~Rec’. As an example of an in-correlation in l, 
let us take C: ($). Then C(8,) (D26-2b) is ‘Pb. Pa. Rbc. Rca. 
Rac . ~Pc . ~Rbb . ~Rba . ~Rcb . ~Rec « ~Rab . ~Raa’. This, how- 
ever,is not a 3; we obtain a 3, which we will call 34, by rearranging the 
components in lexicographical order: ‘Pa. Pb. Rac . Rbc . Rca a ~Pc. 
~Raa.~Rab. ~Rba . ~Rbb. ~Rcb. ~Rec’. Thus 3, is constructed 
from 3: by C (D26-4) and hence is isomorphic to 3, (D26-3a). 

If two 3 are isomorphic, we shall sometimes also say that they repre- 
sent or have the same structure, thus extending the use of this term which 
is ordinarily applied to single relations (Russell’s term ‘relation-number’) 
or their predicates. 

The 3 are in certain respects similar to individual distributions. [We 
shall see later that in systems which contain only pr of degree one, any 8 
can even be transformed into an L-equivalent individual distribution 
for all in (T34-1).] For a given 3; and an in-correlation C, even if C is 
not identity, sometimes the 3 constructed from 3; by C is 3; itself; this 
is analogous to an example with individual distributions considered in 
the preceding section. For a given 3; in £y, the number of 3 isomorphic 
to it is at most W/, because this is the number of in-correlations in ty 
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(T 26-1); but for the reason just mentioned, it is often smaller and some- 
times as small as r. As an example for the latter case, take that 3, in tw 
whose components are the atomic sentences of tw; here, obviously, any 
3 constructed from 3; by any in-correlation is 3; itself; in other words, 
there is no other 3 isomorphic to 3;. [For those systems which contain 
only pr of degree one, we shall later give a theorem stating the number of 
those 3 which are isomorphic to any given 3; (T35-4).] 

Let, us go back to the example of 3, and 8, in %. The number of in- 
correlations for &; is 3! = 6 (T26-r). If we construct a 8 from 3: by each 
of these six correlations, it turns out that in this particular case all corre- 
lations, except of course the identity ($); lead to 3 which are distinct 
from 3,. Hence here we have six 3 isomorphic to 3:. Among them are 8, 
itself and 3,; the others, which we will not actually construct here, may 
be called 33, 34 8s and 36. Let j be the disjunction 3, V BV BV 3,V 
3; V 36. Since the 3 are similar to individual distributions, 7 is similar 
to a statistical distribution. We can easily see that j is L-equivalent (in 2;) 
to the following sentence k, which is purely general, that is, does not con- 
tain any in: ‘(Ax)(Qy)s)[x A yee #3. #3. Px. Ps. Rxy. Ryzs 
Rzy . ~Py. ~Rax . ~Ras . ~Ryx « ~Ryy « ~Rox ~Rsz)’. 

Proof. h is L-equivalent to a disjunction of all instances of the matrix in- 
cluded in square brackets (T22-3d). If the same in is substituted for two or 
three of the variables, then at least one of the #-sentences is L-false and hence 
the whole conjunction likewise; therefore, an instance of this kind may be 
dropped as a component of the disjunction. In this way only those six instances 


remain in which three distinct in are substituted for the variables. Here, all 
»-sentences are L-true and hence may be dropped as conjunctive components. 


evar 


Thus, the instance resulting from the substitution of ‘a’, ‘b’, ‘c’ for ‘x’, Y’, ‘2’, 
respectively, is transformed into 3:. And the transformation of the whole leads 
to a sentence which differs from j at most in the order of conjunctive or disjunc- 


tive components. 


3,, and likewise any other 3, states for every individual in 2;, whether 
or not it has the property P, and for every ordered pair of individuals, 
whether or not the relation R holds for them. The sentence j, on the other 
hand, does not give specific information about the particular individuals; 
however, it still says something about the three individuals of 2;, though 
only in a general way. Among other things, j says for instance the follow- 
ing, as can easily be seen by an inspection of h: (1) there are just two of 
the three individuals possessing the property P; (2) none of the indi- 
viduals bears the relation R to itself, in other words, R is irreflexive; (3) R 
is not symmetric; (4) R is not asymmetric; (5) if x and y are P, R does 
not hold between x and y. Those features of properties and relations which 
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can be expressed in a purely general way, that is, without the use of in, 
are called structural features (as examples, see the concepts defined in 
Da5-1a, b, d, e and D25-2a to g without the prefix ‘L-’). We see that 7 de- 
scribes structural features of P (e.g., (1)), of R (e.g., (2), (3), (4)), and of 
P and R together (e.g., (5)). And, moreover, j does not leave open any 
question with respect to structural features of P and R; any such feature 
which is expressible in @ is either affirmed or denied by j, because for 
any purely general sentence / either }j D / or }j D ~l, as we shall see 
(T3b). 

Thus we see that j describes those structural features of the pr of g, 
which are expressed by 3;, and likewise by each of those 3 which are 
isomorphic to 8,. We might call the totality of these structural features 
the structure of 3., which is the same as the structure of each of the 3 iso- 
morphic to 8,. Each of these 3 describes this structure but, in addition, 
gives specific information about the individuals. On the other hand, 7 
describes just this structure and does not say anything more. The same 
holds of course for any sentence L-equivalent to j, for instance, h. How- 
ever, we shall apply the term ‘structure-description’ and the synonymous 
sign ‘Str’ to only one of the sentences L-equivalent to j, viz., the dis- 
junction formed from j by arranging the disjunctive components in lexi- 
cographical order (Dr). The reason is the same as in the analogous case 
of statistical distributions: on the basis of our definition there is exactly 
one structure-description for every structure of the universe of a given 
system. We shall define the term ‘structure-description’ only for finite 
systems ly, because it is only in these systems that the 3 are sentences 
and hence a disjunction of them can be formed. [While the term ‘structure- 
description’ is a well-defined technical term of our theory both in deduc- 
tive and in inductive logic, we use the term ‘structure’ only in informal 
explanations like those just given.] 


+D27-1, 

a. j is the structure-description corresponding to 3; (or, 3; belongs to the 
structure-description j) in Wy =p¢ 3; is a 3 in £y, and 7 is the dis- 
junction of all 3 which are isomorphic to 3i, arranged in lexico- 
graphical order. 

b. j is a structure-description (Gtr) in Qy =p; there isa 3: in Ly such 
that j is the structure-description corresponding to 3; (in the 
sense (a)). 


Dra and b are analogous to D26-6b and c for ‘statistical distribution’ ; 
thus the remarks following D26-6 hold here in an analogous way. In par- 
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ticular, if there is no other 3 isomorphic to 8,, then the structure-de- 
scription corresponding to 3; is 3; itself. i 
In the beginning of this section we have given two examples of Zina 
system &; with only two pr. The corresponding Str is a disjunction of six 
such 3. This shows that even in very poor systems the Str are some- 
times rather long sentences. If we had actually to write down some Str, 
the long form would be rather awkward, and it would be more convenient 
to choose the much shorter general form (for example, the sentence h 
with three existential quantifiers in the earlier example) as the standard 
form for Str. However, we shall hardly ever have to write down a Str in 
the course of our discussions. It is true, the concept of Str will play a 
fundamental role in our system of inductive logic, and we shall not only 
state general theorems but also often deal with concrete examples, for in- 
stance, carry out numerical computations for the degree of confirmation 
for given sentences. In a case of this kind, we may actually write down the 
sentences in question; we shall then have to speak about the 3 in which 
they hold and the Str corresponding to these 3, and we shall have to cal- 
culate the number of 3 belonging to a Str. But even in such cases it will 
not be necessary to write down the 3—although we shall occasionally do 
so, as in this section—and we shall not write down any Gtr. Therefore 
there is no inconvenience in choosing the disjunctive form for the Str. 
And it seems that this form shows the logical relations between the 3 and 
the Gtr in a simpler way. We imagine—without actually carrying it out— 
classifying all 3 in a given system Qy with respect to their structure, that 
is, dividing them in classes of mutually isomorphic 3; and then we imagine 
constructing the Str simply as the disjunctions of the 3 in each class. 
The following theorem is analogous to a previous theorem on statistical 
distributions (T2a, b, c, f (1, 2, 3) correspond to T26-5c, d, e, f, respec- 
tively). 
T27-2. Let 3; and 3+ be any 3 in 2y, and Gtr; be the structure-descrip- 
tion corresponding to 3:. 
a. | 3: D Str;. (From Dra.) 
b. If 3, does not belong to Str;, then} 3, D ~Gtr;. (From T21-8a, in 
analogy to T26-5d.) 
c. If} 3 D Str; then 3 belongs to Str;. (From T20-sb, (b), in anal- 
ogy to T26-5e.) ` 
d. Str; holds in 3;. (From (a), T20-2t.) 
e. If Str; holds in B:, then 3: belongs to Str;. (From T20-2t, (c).) 
f. The following four conditions are logically equivalent to each other 
(i.e., if one of them holds, the others hold also): 
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(1) 3x belongs to Str;, 

(2) Bz is isomorphic to 3; 

(3) 3x L-implies Str;, 

(4) Str; holds in 3z. 
(From Dra, (a), (c,) (d), (e).) 


The following theorem T3 shows that the relation between purely gen- 
eral sentences (D16-6h) and Gtr is similar to the relation between sen- 
tences of any form and 3. In particular, we found earlier that any not 
L-false sentence in fy is L-equivalent to a disjunction of 3 (T21-8c). 
Analogously, we find now that any not L-false, purely general sentence 
in £y is L-equivalent to a disjunction of Str (T30). 


T27-3. Let / be a purely general sentence in Qy. 
a. If 7 holds in 3;, and Gtr, is the Str corresponding to 3: then 
| Str, DL 


Proof. Let 8i, 3i, 8i’, etc., be the 8 isomorphic to 3;. Then Str, differs from 
3: V 3: V 3: V . . . at most in the order of disjunctive components and hence is 
L-equivalent to this disjunction. If } holds in 3:, | 8: D1 (T20-2t), and hence 
} Bi D1, ki’ D1, ete. (T26-3a), hence | (3; D1) «(3/ Dl)... (T20-2p), 
hence } 3: V8: V... D1 (T21-5n(4)), hence | Str, D L. 


b. For any Str; in Q, either | Str; D J or } Str; D ~i. 


Proof. Let 3; be one of the 3 belonging to Str;. Then either } or ~l holds 
in 3; (T19-2). Hence theorem from (a). 


c. If Zis not L-false, Z is L-equivalent to a disjunction of n Str in Ly 
(n = 1). 


Proof. If l is not L-false, K; is not empty. Let h be a disjunction of all 3 in 
Rı such that mutually isomorphic 3 stand together as a subdisjunction of k 
and are arranged within this subdisjunction in lexicographical order. Then 1 is 
L-equivalent to # (T21-8c). Each of the subdisjunctions contains all 3 which 
are isomorphic to any 3 occurring in it (T26-3b) and hence is a Gtr. Thus, h is 
a disjunction of Str. 


§ 28. Correlations for Basic Matrices 


Correlations among basic matrices are defined (Dr). They are analogous to 
in-correlations (§ 26A); they are, however, less important and will seldom be 
used. L-concepts are invariant with Tespect to transformations by these corre- 
lations (T2). 


The concepts introduced and discussed in this section will not often be 
used, and then chiefly in Volume II. 
The correlations defined by Dx have a certain similarity to the in- 
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correlations (D26-1); they are somewhat more complicated but less im- 
portant. 


D28-1. C is a correlation of the basic matrices or an Nt-correlationin | = pt 
C is a one-one relation between expressions in £ satisfying the following 
conditions. 


a. 


To every atomic matrix M; (D16-3b) of the form prjist. . . . in 
(containing pr; of the degree n and the m alphabetically first i in 
their alphabetical order) exactly one expression is correlated by C; 
we call it the C-correlate of M; or, in signs, C(M:). 


. If M; has the form described in (a), then C(M;) is a basic M (D16-3c) ~ 


with a pr of the same degree n as pr; (it may be pr; itself) and the 
same variables as in M; in any order. (Thus C(M,) may be M; 
itself.) 


. If both M; and My; have the form described in (a) but with two dis- 


tinct pr, then C(M,) and C(M,) contain two pr which are likewise 
distinct from one another (but, as mentioned in (b), not necessarily 
distinct from the two pr occurring in M; and W). 


The following definition has a certain analogy to D26-2, but is some- 
what more complicated. The use of ‘C(M;)’, as defined in Dr, is hereby 
extended to new cases. 


D28-2. Let C be an M-correlation in l. 


a. 


Let M; be an atomic matrix of a form different from that described 
in Dra. Let M; be that atomic matrix of the form described in Dra 
which contains the same pr as M; hence M; can be formed from M; 
by certain substitutions for the variables. C(M,) =: the expression 
(a basic matrix) formed from C(t.) by those same substitutions. 


. Let M; be atomic. (1) If C(M;) is atomic, C(~M;,) =p ~C(M). 


(2) If C(M,) is not atomic and hence of the form ~M, C(~M;) 
=pe Mt. 


. Let Dt, be a AEA matrix in &. C(M,) = ps the expression (ma- 


trix) formed from M, by replacing every occurrence of any basic My 
with C(M) (as determined by Dx or (a) or (b)). (If ~M; occurs with 
an atomic M;, then ~My is to be replaced by its correlate (deter- 
mined by (b).) 


. Let Q; be a class of matrices “(which may be sentences) in L. C(®;) 


=p; the class of the C-correlates of the elements of &;. 


Example. ‘Rxy’ has the form described in Dra. According to D1, we 
may choose as its C-correlate any basic matrix of degree two with the 
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variables ‘x’ and ‘y’ in any order. Suppose we choose ‘~Syx’. Then the 
C-correlate of all other basic matrices’containing ‘R’ is determined by Dz. 
Thus, according to Dza, C(‘Ryx’) is ‘~Sxy’; C(‘Rac’) is ‘~Sca’; further, 
according to D2b(2), C(‘~Rac’) is ‘Sca’. Then, we can find the C-corre- 
late of any matrix containing no other pr than ‘R’ with the help of Dac; 
thus, C(‘~Rac V (x)(Ay)Ryzx’) is ‘Sca V (x) (Ay) ~ Szy’. 

D3 is analogous to D26-4. 


D28-3. Let C be an M-correlation in l. 3; is constructed from 3; by 
C=y ž . 
(a) (in le) 3; is C(3:) (in the sense of D2d), 
(b) (in €y) 8, is formed from C(3;) by arranging the conjunctive com- 
ponents in their lexicographical order. 
_ T2 is analogous to T26-2. 


128-2. Invariance of L-concepts with respect to M-correlations. Let C 
be an M-correlation in £, i and j be sentences in &, i’ be C(i), and H 
be C(j). 

‘a. §(i’) is the class of those 3 which are constructed from the Zink, 

by C (D3). (Proof analogous to T26-2a.) 

b. | 7 if and only if i’. (From Dz2o-1a, (a).) 

c. tis L-false if and only if i’ is L-false. (From T20-1a, (b).) 

d. L-implication (or L-equivalence, L-disjunctness, L-exclusion, re- 

spectively) holds for and j if and only if the same relation holds 
for 7’ and j’. (From T20-1b, c, d, e, (b).) 
e. 7 is factual if and only if 7’ is factual. (From (b), (c).) 


Example. Since ‘(x)[~Rax D (Ay) ~Ryx]’ is L-true, ‘(x)[Sxa > (Ay) 
Sxy]’ is likewise L-true. 


728-3. Let C be an N-correlation and C’ an in-correlation in &. 
a. For any 7 in £, C(C’(i)) is the same as C’(C(i)). 
Proof. The transformation by C’ concerns only the in. The transformation 
by C may change three things: (1) a sign of negation may be added or removed, 
(2) a pr may be replaced by another one, (3) the order of the argument signs 


may be changed. Thus the two transformations are independent of each other, 
and hence the order in which they are carried out is irrelevant for the final result. 


b. For any 3; in l, the one 8 constructed by C (D3) from the one 3 
constructed by C’ (D26-4) from 3; is the same as the one 3 con- 
structed by C” from the one 3 constructed by C from 3;. (Proof anal- 
ogous to (a).) 


i 
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§ 29. Some Numbers Connected with the Systems 2 


Some numbers which are characteristic for any system &, especially for any 
finite system fy, are defined. r is the number of the Str (D1a), ¢ the number 
of the 3 (D2b) in fy. 

In this section we introduce into the metalanguage some symbols for 
certain numbers with respect to any given system &. [Strictly speaking, 
these symbols designate numerical functions whose arguments are the 
systems £; hence, in a complete notation we ought to write, for instance, 
‘¢(Q)’ for ‘the number of 3 in the system 2’; however, since the context 
will usually make clear which system is meant, we shall simply write 
‘p instead.] 

We intend to define ‘7’ in such a way that it designates the number of 
structures in the system in question. (We do not take ‘a’ for this purpose 
because it is the customary symbol for the standard deviation.) However, 
the definition of ‘r’ will not contain the term ‘structure’, because we use 
this term only in an informal way and have not given a technical defini- 
tion for it. Instead, the definition will refer to something within the lan- 
guage system that represents the structures. In £y, we take, of course, the 
Str for this purpose (Db); in le, we may take the classes of isomorphic 
3 (D2), in accordance with an earlier remark. 

D29-1. 

a. For Qy. T =p¢ the number of Gtr in Ly. 

b. For Sm. 7 =p¢ the number of those classes of 3 in lo which con- 

tain, for some 8; exactly those 3 which are isomorphic to 3:. 

D29-2. For any finite or infinite system . 

a. B =p; the number of atomic sentences (and hence of basic pairs) in l. 

b. ¢ =p, the number of 3 in l. 

c. p =p: the number of ranges (that is, of all classes of 8) in l. 

729-1. The following holds for any finite or infinite system &. 

a. ¢ = 2°, (From T40-318.) 

b. p = 2. (From T4o-31h.) 

729-2. In any finite system Ly, the number of largest classes of mutually 
L-equivalent sentences is p, hence 2° (Txb). 


Proof. Sentences are L-equivalent if and only if they have the same range ` 
(D2o0-1d). Furthermore, for every class 8; of 3 in Qy, there is a sentence h 
whose range is Ri; if &; is non-empty, we take as %4 a disjunction of the 3 in &; 


(T21-8d); if &: is empty, we take ‘~?’ 


We use the term ‘proposition’ in such a sense that two sentences are 
said to express the same proposition if and only if they are L-equivalent 
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({Semantics], p. 92; [Meaning], p. 27). Hence p = 2° is likewise the num- 
ber of propositions expressible by sentences in fy, and also the number 
of propositions expressible by classes of sentences (T21-8e). Thus we see 
that the number of propositions expressible in £y is finite (but, as we shall 
find later, this number is enormously large, even for rather narrow sys- 
tems; see § 35), although the number of sentences and still more the 
number of classes of sentences in fy is infinite (the first is denumerable, 
the second nondenumerable). 

The following theorem speaks about infinite cardinal numbers (for ‘as’, 
etc., see D4o-8); it is merely intended to give some additional information 
about the logicomathematical nature of certain classes of expressions in 
Ro, but it will not be used for the later construction of our system of in- 
ductive logic. 


T29-4. The following holds for le. 

a. The following classes of expressions in lo are denumerable, hence 
their cardinal number is as: (1) the in; (2) the atomic sentences 
and hence the basic pairs (8); (3) the sentences; (4) the expres- 
sions. 

b. The number of classes of sentences is a;. (From (a) (3), T40-31h.) 

c. ¢ = a, (From Tra, (a)(2).) 

d. p = a.. (From Tıb, (b).) 

e. The number of propositions expressible by sentences is ao. 

Proof. 1. This number cannot be larger than a (a)(3). 2. The atomic sen- 
tences of the infinite sequence ‘Pa,’, ‘Pa,’, etc., express different propositions 


because no two of them are L-equivalent. Their number is as. Therefore the 
whole number of propositions cannot be smaller than ao. 


f. The number of propositions expressible by classes of sentences is a;. 


Proof. 1. This number cannot be larger than a; (b). 2. The subclasses of the 
sequence of atomic sentences mentioned in the proof of (e) express different 
Propositions because no two of them are L-equivalent. Their number is a; 
(T40-31h). Therefore the number sought for cannot be smaller than ar. 


T4e and f show that in x, in distinction to ly (T2), the number of 
propositions expressed by sentences and that of propositions expressed 
by classes of sentences are different and are both smaller than p. 


§ 31. The Systems 27; the Q-Predicates 


$$ 31-38 deal only with properties, not relations. If a system £ has primi- 
tive predicates for properties only, we designate the number of these predicates 
by ‘x’ and the system by ‘g7’. We define the molecular predicates ‘Q,’, ‘97, 
etc., by conjunctions in which every primitive predicate or its negation occurs 
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(Ax). Thus these Q-predicates (Dx) designate the strongest factual properties 
expressible in the system. Every factual molecular property expressible in the 
system is expressible by a disjunction of some of the Q-predicates (T2f). These 
predicates constitute a division (T2d). Their number is x (D2), =2* (T1). 


The following part of this chapter (§§ 31-38) deals with that part of 
the deductive logic of attributes which is most important both for de- 
ductive and for inductive logic, viz., the logic of properties in distinction 
to the logic of relations. Although our procedure here is, of course, based 
upon the customary method used in the logic of attributes (the so-called 
lower functional logic; see §§ 22-26), many features of our procedure and 
most of the concepts here introduced are new. Some of these concepts 
(especially those of §§ 31, 32, and 34) will be continually used later in our 
theory of inductive logic (in Vol. II); the concepts of this part will not 
be used before §107. h 

Since in what follows we restrict ourselves to properties, we shall speak 
not of all systems £ but only of those whose primitive predicates are all of 
degree one. We shall designate the (finite) number of these predicates in a 
system of this kind with ‘m’. (This use has of course nothing to do with 
the use of the same Greek letter for the number 3.14 . . . in analysis.) 
We call these systems the systems £". 8% is a finite system of this kind; 
for instance, 35. is the system which contains one hundred in and three 
pr of degree one and no pr of higher degrees. Qa is an infinite system of 
this kind, for instance, £% is that system which contains the infinite 
sequence of in and five pr of degree one. 

In our theory of inductive logic to be constructed later, the definitions 
of the fundamental concepts, e.g., the concept of degree of confirmation 
and related ones, and some theorems will be formulated in a general way, 
with respect to any system £. However, most of the theorems, especially 
those which deal with the various kinds of inductive inferences and which 
state methods for the computation of the degree of confirmation for sen- 
tences of certain forms, will apply to the systems 2” only (see § rro). 
In other words, the bulk of our inductive logic will deal only with prop- 
erties of individuals, not with relations between individuals, except for 
those relations which are defined on the basis of properties. At the present 
time this restriction seems natural and well justified, in view of the fact 
that deductive logic took more than two thousand years from its start 
with Aristotle to the first logic of relations (De Morgan, 1860). Inductive 
logic, that is, the theory of probability, is only a few hundred years old. 
Therefore, it is not surprising to see that so far nobody has made an at- 
tempt to apply it to relations. (Incidentally, the same holds for the 
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theory of probability., i.e., relative frequency.) The inclusion of rela- 
tions in deductive logic causes obviously a certain increase in complexity. 
The corresponding increase in complexity for inductive logic is very much 
greater. One of the points where the great and so far unsurmounted diffi- 
culties in connection with relations in inductive logic arise is the follow- 
ing one. For the determination of m*, the number r of the Gtr in any 
finite system is required (see (2) in § 110A). A general formula for 7 in the 
systems Q% can easily be given (T35-1d). However, for systems with pr 
of higher degrees, no analogous theorem is known, not even for the 
simplest case of systems fy with a pr of degree two as the only pr. In other 
words, the deductive logic of relations, although widely developed in 
other respects, is today unable to give us a general formula stating the 
number of structures of one dyadic relation for finite V , let alone the same 
for several relations. 


A solution of the problem just mentioned would be of importance not only 
for deductive and inductive logic but also for certain branches of science. Pre- 
liminary work for the solution of the simplest case, that of one dyadic relation, 
has been done in that branch of combinatory topology which is known as the 
theory of graphs. For a survey of this theory see Denes König, Theorie der end- 
lichen und unendlichen Graphen: Kombinatorische Topologie der Streckenkom- 
plexe (“Mathematik und ihre Anwendungen,” ed. Artin, Vol. 16 (Leipzig, 
1936]). Kénig discusses (in § 5) the problem of the numbers of graphs of vari- 
ous kinds and refers to the original investigations by C. Jordan (“Sur les as- 
semblages de lignes”, Journal f. reine u. angew. Math., 70 [1869], 185-90) and 
A. Cayley (“On the analytical forms called trees, with application to the 
theory of chemical combinations”, Report British Assoc. Advanc. Science, 1875, 
PP- 257-305, reprinted in Mathem. Papers, IX, 427-60). The graphs correspond 
to the structures of symmetric relations (D25-2c). For the solution of the sim- 
plest case of our problem, the results found by the authors just mentioned must 
be generalized so as to cover also the nonsymmetric relations. 


As primitive predicates pr, pra, .. ., pr, in any system Q" we take ‘P,’, 
‘P,’,...,‘P,’. If we consider a Sequence of systems lf with increasing N, 
we shall usually take the pr and hence their number r as unchanged; 
for example, in the sequence 23, &3,.. . each system contains the same 
three pr ‘P,’, ‘P,’, ‘P,’, and so does the system 2%, which is the infinite 
System corresponding to the mentioned sequence of finite systems. 

We shall now explain a procedure for defining, on the basis of the pr in 
a system £", molecular predicates of a particular kind (see the terms ex- 
plained at the beginning of § 25), the Q-predicates ‘Q,’, ‘Q,’, etc. The 
properties designated by these predicates will be called Q-properties. Let 
us illustrate the procedure for r = 3, hence for a system 23. The number 
of in is irrelevant for this procedure. Hence the following construction is 
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the same for every system £$, including 23. The subsequent table Ax 
contains three argument columns for ‘Pr’, ‘P7, and ‘P’; the lines show 
the eight possible distributions of two values, affirmation and negation, 
designated by ‘+’ and ‘—’, respectively, among the three pr. Thus this 
table is analogous to the truth-table for three sentences constructed in 
§ 21B. In analogy to the k-sentences there (T21-7), we have here the 
molecular predicate expressions listed in the second column; we call them 


ki + Q-predicate-expressions. They are conjunctions of basic predicate expres- 
| i sions, that is, of pr or their negations. In the third column the Q-predicates 
| a are introduced as abbreviations for the Q-predicate-expressions. (Ar in- 
| i P troduces the examples ‘Q,’, etc., in the object language £3; Dx introduces 
| ie the terms ‘Q-predicate-expression’, etc., in the metalanguage.) 
a 
) é +A31-1, Table for the Q-predicates in & 
4 PP Sam Qoredicateexpressions | Q-predicates 
P + + + Pye Pie Py a 
aeeti Pia Pia ~P; Or 
E T P: a ~Pa» P; Qs 
g + - = Pis ~Pa ~P, Or 
ad pagel ar ~Pr«Pi«P; Qs 
- + - ~P; s P: s ~P; Qs 
REE ~P: » ~Pan P, Q, 
- - - ~P: s ~Pa s ~P, Qs 


D31-1. For any system £”. 

a. A; is a Q-predicate-expression = ps A; is either the conjunctive predi- 
cate expression containing all pr in their alphabetical order CBee Pan 
...«P,) or is formed from this expression by replacing some of the 

' pr with their negations. 
+b. AM; is a Q-predicate =p: A; is a predicate defined as abbreviation 
for a Q-predicate-expression. ‘Q,’ is taken as abbreviation for the 
Ge mth of the Q-predicate-expressions in their lexicographical order. 
be c. M; is a Q-mairix =p: M; is a full matrix of a Q-predicate. 

d. iis a Q-sentence =p: 7 is a full sentence of a Q-predicate. 

+D31-2. For Q”. x = ps the number of the Q-predicates. 

+T31-1. For any system £", x = 2", (From T40-31f.) 

If we take in the table Ar, instead of the three pr, full sentences of them 
with the same in, for example, ‘P,a’, ‘Pa’, ‘P3a’, then the table becomes 
an ordinary truth-table, as in § 21B. For instance, the k-sentence ‘P,a« 
~Pa. P,a corresponds to the third line. This sentence can be ab- 
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breviated by ‘(P, . ~P,.P;)a’ (A25-1) and hence now, with the help of 
At, by ‘Q,a’. 

A look at the table Ax shows that every individual must have one and 
only one of the Q-properties. Hence they form a division (Tad). This 
Q-division is the strongest division possible in 2"; that is to say, no factual 
property stronger than a Q-property can be defined in £"; in other words, 
a Q-property cannot be subdivided into several factual properties by 
means of the pr in 2”. In the terminology of Aristotelian logic, the Q-prop- 
erties are the infimae species. 


731-2. For any system &”. 
+a. Any two distinct Q-predicates are L-exclusive. Hence a conjunction 
of two Q-sentences with two distinct Q-predicates and the same in is 
L-false. (From T21-7a, T25-1d.) 

b. The Q-predicates are L-disjunct. Hence a disjunction of full sen- 
tences of all Q-predicates with the same in is L-true. (From T21-7b, 
D25-1e.) 

c. Every Q-predicate is factual. Hence every Q-sentence is factual. 
(From T20-5g, D25-1¢, T25-1¢.) 

+d, The Q-predicates form a division, (From (a), (b), (c), D25-4.) 

e. Let ‘W’ be a molecular predicate L-equivalent to a disjunction of » 
Q-predicates (1 S m S x — 1). Then the disjunction of the remain- 
ing x — n Q-predicates is L-equivalent to ‘~M’. (From (d), T2 5-4b.) 

+f. Every molecular predicate expression and hence every molecular 
predicate is either L-empty or L-equivalent to a disjunction of » Q- 
predicates (1 <  S x). (From Tar-7d.) 

g. Any disjunction of  Q-sentences (n = 1) is not L-false. (From (c), 
T20-2q.) 


§ 32. Logical Width 


The concept of the logical width of a molecular predicate expression 9; is 
defined in the following way (D1). If A; is L-empty, we ascribe to it the width o. 
Otherwise, %; is L-equivalent to a disjunction of Q-predicates; in this case, we 
take the number of these Q-predicates as the width of %,. 

Let P, and P, be two properties which are logically independent of 
each other, for example, Small and Black. Then the property P, . P, 
(Small-and-Black) is in a certain sense stronger or narrower than P,; P, is 
weaker or wider. And P, V P, (Small-or-Black) is in this sense still wider 
than P,. By ‘wider’ we do not mean here ‘having a greater extension’. The 
extension of the property P, V P, that is, the class of individuals possess- 
ing this property, may be greater than that of P, or it may be the same. 
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The latter would be the case if all black things in the world happened to 
be small. Whether or not this is the case is a factual question. What we 
mean by ‘wider’ is not a factual but a logical relation. The property 
P, \ P, is wider than P, by admitting more possibilities; for instance, the 
possible, that means, not L-empty, property ~P, . P, is admitted by the 
first but excluded by the second. 

The method just described for comparing the widths of two properties 
is applicable only in the special case where one of the properties L-implies 
the other. If P., Pa, P;, P, are four properties all logically independent of 
one another, then this method does not enable us to compare P, V P, 
with P, V P,. If we wish to make possible a comparison of widths in all 
cases, we need additional conventions. Now, any language system & fur- 
nishes a natural basis for these conventions with respect to the proper- 
ties expressible in the system by its selection of primitive properties. For 
the sake of simplicity, we restrict the discussion to molecular properties 
and the predicate expressions or predicates designating them in a sys- 
tem &". In a system £", the Q-properties are the narrowest non-L-empty 
properties. Thus it seems natural to assign to each of them the smallest 
positive width, say, 1. To the L-empty property we assign the width o. 
Every non-L-empty property which is not a Q-property is a disjunction 
of two or more Q-properties (T31-2f); it seems natural to take the num- 
ber of these Q-properties as its width. Thus we are led to the following 
definition (Dz). 


+D32-1. Let M; be a matrix of degree one in 8”, with i; as the only 
free variable. 
a. M; has (the logical width or briefly) the width w = ps 
either (1) M; is L-empty and w = 0; 
or (2) M; is L-equivalent to a Q-matrix with i; and w = 1; 
or (3) M; is L-equivalent to a disjunction of w distinct Q-ma- 
trices with i; (w > 1). 
b. M; has (the relative logical width or briefly) the relative width q =p: 
M; has the width w, and g = w/k. 


In analogy to D25-1 and 2, we shall use the terms ‘width’ and ‘relative 
width’ in the following three ways. Each of these terms is applied (A) to 
a matrix (of degree one), (B) to a corresponding predicate expression, for 
instance, a molecular predicate, (C) to the corresponding property. 

The concept of logical width is very important for inductive logic. One 
of the decisive defects of the classical theory of probability is the failure 
to take into consideration the width of the properties involved. Some of 
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the fundamental principles and theorems of the classical theory lead to 
contradictions because they are formulated in a too general way for all 
properties. One of the modifications by which we shall eliminate these 
‘contradictions will consist in making the degree of confirmation depend- 
ent, among other things, upon the widths of the properties involved. 


+T32-1. Let M; be a molecular predicate expression in 2", M; a molecu- 
lar predicate abbreviating %,, and M; a molecular matrix which is the 
expansion of a full matrix of A; and hence of Y. 

a, There is one and only one integer w which is the width of M, and 

hence of Y; and W, and o S$ w < x. (From T31-2f.) 

b. There is one and only one (rational) real number q which is the rela- 

tive width of M;, and hence of Y; and A; ando S g < 1. (From (a).) 
+T32-2. Let A; be a molecular predicate expression or a molecular 
predicate in Q" with the width w and hence the relative width q = w/k. 
a. A; is L-empty, and hence every full sentence L-false, if and only if 
w = o and hence g = o. (From T25-1b.) 

b. M; is L-universal, and hence every full sentence L-true, if and only 

if w = « and hence g = r. (From T31-2b, T25-1a.) 

c. A; is factual, and hence every full sentence factual, if and only if 

© < w < «and hence o < g < 1. (From (a), (b).) 
d. ~A; has the width x — w, and hence the relative width 1 — g. 
(From T31-2e.) 

T32-3. Let k be a conjunction of x basic sentences in 2” with distinct 
pr but the same in,. Hence & may be abbreviated by (%,)in,, where Y; is 
a molecular predicate expression of conjunctive form with m components, 
each of them being a pr or its negation, every pr occurring exactly once. 

a. A; is L-equivalent to a Q-predicate-expression and hence to a Q- 

predicate, and thus has the width r. (From D3r-1a.) 

b. k is L-equivalent to a Q-sentence with in;. (From (a).) 


132-4. Let k be a conjunction of # basic sentences in &* (t Sn<7) 
with distinct pr but the same in;. Hence k may be abbreviated by (4;)in:, 
where X; is a molecular predicate expression of conjunctive form with # 
components, each of them being a pr or its negation, no pr occurring more 
than once. . 

a. (1) k is L-equivalent to a disjunction of 2*~" distinct Q-sentences 

with in; hence 
(2) A; has the width 27”. 


Proof. The number m of those pr which do not occur in kisw—n. We construct 
first the conjunction of the atomic sentences with these m pt and with in, in 
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their lexicographical order, and furthermore all those other conjunctions which 
are formed from the first one by replacing some atomic sentences with their 
negations. The number of all these conjunctions, including the first, is 2” 
(T40-31g). Let these conjunctions be h:, /z,.... Let h be their disjunction; 
then | 4 (T21-7b). Therefore, kis L-equivalent to k « h, that is, k a (i V ha V -~ -), 
hence, by distribution, to (k « h:) V (k « ka) V . . . . For any p from 1 to 2”, 
ke hy is a conjunction of r basic sentences with r distinct pr and the same in: 
and hence is L-equivalent to a Q-sentence with in; (T3b). If 4; and h; are any 
two distinct A-sentences, then they are L-exclusive (T21-7a), and hence k « hy 
and k « h; are L-exclusive; therefore the corresponding Q-sentences are distinct. 
Thus the 2™ sentences of the form k « kp correspond to 2” distinct Q-sentences 
with ins. 1. k is L-equivalent to their disjunction. 2. Therefore, A: is L-equiv- 
alent to the disjunction of the 2” distinct Q-predicates, and hence has the 
width 2”. 


b. The relative width of M; is 1/2”; hence it is independent of r. 


Proof. The relative width is 2*-"/x (a), = 2%"/2™ (T31-1), = 1/2”. 


Corollary. A primitive predicate has the width x/2 and the relative 
width 1/2. (From (b), for n = 1.) 


132-5. Let p predicates be given which form a division (D25-4) in 2". 


a. 


b. 


The sum of the widths of the given predicates is x. 


Proof. Let wm be the width of the mth predicate (m = 1 to p). Since the 
predicates are factual (T25-2), 0 < Wm < «K (T2c). The mth predicate is L- 
equivalent to a disjunction of wm Q-predicates (Dra(3)). If we form these 
disjunctions for the p predicates, then every Q-predicate occurs in one and only 
one of them, because the predicates are L-disjunctive and L-exclusive in pairs 


(D25-4a, b). Hence the assertion. 


The sum of the relative widths of the given predicates is 1. 
(From (a).) 


T32-6. 
a. “(Q1 V Q4 V Q;)+(Q: V Qs V Q,)’ is L-equivalent to ‘Q, Vọ@;, and 


c. 


hence has the width 2. 

Proof. By multiple distribution (T21-5m(3)), the given conjunction is L- 
equivalent to a disjunction of nine components, every component being a con- 
junction of one Q-predicate from the first parenthesis and one from the sec- 
‘ond. Of these nine conjunctions, seven contain two distinct Q-predicates and 
hence are L-empty (T31-2a) and hence may be dropped as components of 
the disjunction. What remains is ‘(Q2 » Q2) V (Q; = Q3)’, hence ‘Q4 V Q” 


(Q, V Qa) « (Q: V Q4)’ is L-equivalent to ‘Q,’, and ‘hence has the 


width r. (Analogous to (a).) 
Let M; and A; be molecular predicate expressions or molecular predi- 
cates, M; being L-equivalent to a disjunction of m Q-predicates and 
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A; to a disjunction of » Q-predicates (m = 1, = 1). Let p be the 

number of those Q-predicates which occur in both disjunctions. 

(1) If p = o, then Y;. Y; is L-empty. 

(2) If p = 1, then Y; . Ajis L-equivalent to the common Q-predicate. 

(3) If p > 1, then Y;. M; is L-equivalent to the disjunction of the 
$ common (Q-predicates. i 

Hence, in any case, the width of %; . Y; is Pp. (Analogous to (a).) 


` The following theorem facilitates the computation of the width of a 
molecular predicate expression in which not all pr occur. 


T32-7. Let i be a molecular sentence in £" all of whose ultimate com- 
ponents are atomic sentences with in,; let the number of different pr oc- 
curring in i be », where # < x, Let the truth-table for i with respect to 
the # occurring atomic sentences have the value T on exactly m of the 2” 
lines. Obviously, 7 can be abbreviated by (;)in,, where Y; is a molecular 
predicate expression constructed out of the n pr. Let ¿ not be L-false; 
hence m > o, and A; is not L-empty. 

a, Let w= m X 27", = "x. Theniis L-equivalent to a disjunction 
of w distinct Q-sentences with in, hence, Y; has the width w. 


Proof. i is L-equivalent to a disjunction of m conjunctions (T2 1-7d). Each 
of these conjunctions has as components # basic sentences with the » pr and 
with in;, and is hence L-equivalent to a disjunction of 2*-* distinct Q-sentences 
with in; (T4a). If k and k' are any two distinct ones of these conjunctions, then 
they are L-exclusive (T2 1-7a), hence k « k’ is L-false. Therefore, no Q-predicate 
can occur in both & and k’; because otherwise k « ġ' would be L-equivalent to a 
disjunction of p Q-sentences with in; ($ = 1) (T6c) and would hence not be 
L-false (T31-2g). Thus ¢ is L-equivalent to a disjunction of m X 2*-" Q-sen- 
tences with in. 


b. The relative width of YM; is m/2"; hence it is independent of 7. 
(From (a).) i - 


§ 33. The Q-Normal Form 


It is shown how a given sentence with primitive predicates can be trans- 
formed into a sentence with Q-predicates and, in particular, into a Q-normal 
form (D1). The latter will be used in inductive logic. 


In this section we shall show how sentences of £", written in primitive 
notation, can be transformed into L-equivalent sentences with Q-predi- 
cates, and finally into a particular form, called the Q-normal form, similar 
to the disjunctive normal form, Later, in inductive logic, we shall make 
use of the Q-normal form for the computation of the degree of confirmation. 
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133-1. Let pr; be any pr in 2". 

a. pr; is L-equivalent to the disjunction of those Q-predicate-expres- 
sions in which it occurs unnegated, and hence to the disjunction of 
the corresponding Q-predicates. (These Q-predicates correspond to 
those lines in the table of Q-predicates (see A31-1) where pr; has the 
value +.) (From T21-7d.) 

b. pr; has the width x/2, and hence the relative width 1/2. (From (a).) 

c. ~pr; is L-equivalent to the disjunction of those Q-predicate-expres- 
sions in which ~pr; occurs, and hence to the disjunction of the cor- 
responding Q-predicates. (These Q-predicates correspond to those 
lines in the table of Q-predicates where pr; has the value —.) (From 
(a), T31-2e.) 

d. ~pr; has the width «/2, and hence the relative width 1/2. (From (c).) 

e. Every basic sentence with in; is L-equivalent to a disjunction of Q- 
sentences with in;, whose number is «/2. (The transformation of any 
given basic sentence into this disjunction can be carried out accord- 
ing to either (a) or (c).) 

7133-2. Let k be a conjunction of » basic sentences (w = 2) with the 
same in;. Hence, k can be abbreviated by (%;)in:, where Y; is a molecular 
predicate expression in the form of a conjunction with # components, 
each being a pr or a negation of a pr. For every conjunctive component in 
k (or in A), let the class of the corresponding Q-predicates be determined 
according to Tra or c; let &; be the class product of all these classes. Then 
the following holds. 

a. If 8; is empty, then A; is L-empty and k is L-false. (This is the case 
if and only if a pr occurs both unnegated and negated.) (From 
T32-6c(r).) 

b. If Q; contains only one Q-predicate, then %; is L-equivalent to it 
and k is L-equivalent to its full sentence with in;. (From T32-6c(2).) 

c. If Q; contains two or more Q-predicates, then Y; is L-equivalent to 
their disjunction and & is L-equivalent to the disjunction of their 
full sentences with in;. (From T32-6c(3).) 

d. The width of M; is the number of Q-predicates in &;. (From (a), 
(b), (c).) 

Examples for the transformation of basic sentences or conjunctions of them 
into formulations with Q-predicates. We take the system l3; hence we can use 
31-1. (1) and (3) follow from Tra, (2) and (4) from Tıc, (5) from T2b (or 
directly from A31-1), (6), (7), and (8) from T2c. 

1. ‘Pa’ is L-equivalent to ‘Q,¢ V Q-a VQ,cV Qw. 
2. ‘~P,a’ is L-equivalent to ‘Osa V Qsa V Ora V Osa’. 
3. ‘P,@’ is L-equivalent to ‘QaV Qa V Qa V Quo’. 
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. ‘~Pa’ is L-equivalent to ‘Qa V Qua V Qca V Qa’. 
. ‘~Pib «~Pab « Pb’ is L-equivalent to ‘0,5’. 

. ‘~P,b« Pb’ is L-equivalent to ‘Qb V 0,8’. 

‘~P,a « Paa’ is L-equivalent to ‘Qa V Oca’. 

‘Paa «~P,a’ is L-equivalent to ‘Qa V Qsa’. 


DN oun p 


The following definition (D1) gives the rules for transformation into a 
Q-normal form. Sentences of this form, as referred to in Dı and T4, be- 
long, strictly speaking, to enlarged systems containing the Q-predicates, 


Dı and T4 refer to sentences in which the Q-predicates (‘Q,’, etc.) actually 
occur, not only to the expansions of such sentences in primitive notation. Thus, 
strictly speaking, these sentences do not belong to our systems £" but to en- 
larged systems *&*. For instance, the system *23 is constructed from 3 by the 
addition of the eight Q-predicates (A31-1). In this system, a rule is laid 
down to the effect that the range of a Q-sentence is the same as that of the 
corresponding sentence with pr (e.g., the range of ‘Q,b’ is the same as that of 
‘~P,b «~P2b a P b’, see exàmple (5) above). In virtue of this additional rule 

of ranges for any system *g7, there is a close relationship between *2* and g" 
of the following kind. If i is any sentence in *2* containing one or more Q- 
predicates and j is the sentence formed from i by the elimination of all Q-predi- 
cates on the basis of the additional rule of ranges (e.g., in *23, on the basis of 
the table Agr-1), then i and Jj are L-equivalent in *g7, Furthermore, j has the 
same range in *87 as in Q"; therefore, it has the same logical properties in both 
systems (e.g., L-truth, L-implying a certain other sentence, etc.). Because of 
this relationship between the enlarged systems *&" and the original systems 8" 
we may interpret the theorems concerning sentences with Q-predicates (from 
T31-2 on, and including those we shall state later) in the following three ways. 
Any such theorem holds (i) for the sentence in question with Q-predicates in 
*2", (ii) for the corresponding sentence without Q-predicates in *8*, (iii) for 


rect if we place the sentence not in the original system but in the enlarged sys- 
tem. However, in order to avoid unnecessary complications in the formulation 
of our theorems, definitions, etc., we shall omit references to the enlarged sys- 
tems and continue to refer simply to the systems Q”, 


D33-1. Let i be any sentence in £7 or any nongeneral sentence in Q5. 
J is a sentence of Q-normal Jorm corresponding toi =p; Jj is formed from i 
by applying the following rules until none of them is applicable any more. 
(See explanations in D21-2.) 


a. As D21-2a. 


b. Every defined expression occurring, except the Q-predicates, is 
eliminated. 


c top. As D21-2c to p. 
q. Every basic sentence & is replaced by a disjunction h of Q-sentences 


$33. THE Q-NORMAL FORM 133 


with the same in as in k; if k is atomic, / is constructed according to 
Tra; if k is the negation of an atomic sentence, / is constructed ac- 
cording to Tre. j 

r. A conjunction containing as components two Q-sentences with the 
same in but two distinct Q-predicates is replaced by ‘~?’. 


In practice, we can shorten the transformation considerably by pro- 
ceeding in the following way instead of applying (q) and (r): we rearrange 
a conjunction of basic sentences by grouping together the components 
with the same in; then we replace a subconjunction of basic sentences 
with the same in by a Q-sentence or a disjunction of such according to T2b 
or c. (See examples below.) 


133-4. Let i be a sentence in lọ or a nongeneral sentence in 2. Let j 
be a sentence of Q-normal form corresponding to 7. 

a. i and j are L-equivalent. (From T2r-r1oa, Tra, Tıc, T31-24.) 

b. j does not contain ‘~’, unless j is ‘~?. 

Proof. After the application of the rules Dra to p, ‘~’ occurs only in basic 
sentences, unless the whole sentence is ‘~?’ (T21-r0c). ‘~’ in basic sentences 
disappears by Drq. If ‘~? is introduced by Dır, either it disappears by D1j 
and | (i.e., D21-2j and 1) or the whole becomes ‘~?’. 

c. j has one of the following forms: (x) ‘P; (2) ‘~#; (3) a Q-sentence; 
(4) a conjunction of n Q-sentences (n = 2) with n distinct in; (5) a 
disjunction of two or more components of the forms (3) or (4). 
(From T2r-toc, (b).) 

Example for a transformation into Q-normal form. Suppose the following sen- 
tence is given: 

‘Pia «~Pyb « Pb » [Po V Pab D ~P) 
Application of rules (b), (m), and (p) in Dx (i.e., in D2r1-2) yields: 
(P.a1~Pib « Pb a ~Pa a ~Pa) V [Pa ~ P:b « P3b a ~Pa)’. 


This is a disjunctive normal form. According to the shorter procedure men- 
tioned, we group the basic sentences in subconjunctions with the same in: 


(Psa «~ P,a) » (~P:b « Pyb «~ P:)] V [(P30 a~ P30) « (~P:b « P,))’. 

Each subconjunction is now replaced according to T2 (see the examples (7), 
(5), (8), (6) following T2): 

Osa V Qsa) » Qb] V [Qa V Osa) « (Qsb V Qb)}. 

Now rule (p) (distribution of a disjunction) is applied three times: 

{Oza « Qb] V [Osa « 0,6] V [Q20 « 056] V [Oza « 0,6] V [Osa « 06] V [Oca « 0,6)’. 
According to rule (f), the second component of the disjunction is omitted be-" 
cause it is the same as the last: 

‘[Osa « Q] V [0.0 « Osb] V [0.0 « 2,0] V [Oca « Osb) V [Osa = 0,5). 


This is a Q-normal form. 
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§ 34. The Q-Numbers 


Any 3; in Q% can be transformed into a conjunction of N full sentences of 
Q-predicates, one for each in. Thus, 3: is L-equivalent to an individual distribu- 
tion for all in with respect to the Q-division (T1). The number of full sentences 
of ‘Qm’ in the conjunction mentioned, in other words, the number of those indi- 
viduals which in 3; have the Property Qm, is called the mth Q-number in Bi. 
Thus, 3; determines a sequence of x Q-numbers, whose sum is N. Two 3 are 
isomorphic if and only if they have the same Q-numbers (T3). Therefore, any 
Str, and the structure described by it, is completely characterized by the 
x Q-numbers. Thus, any Str is L-equivalent to a statistical distribution for all 
in with respect to the Q-division (T6). The Q-numbers will later be used for the 
computation of degrees of confirmation. 


We shall now see how the 3 in &% can be transformed into sentences 
with Q-predicates. Any given 3, is a conjunction which contains exactly 
one sentence from every basic pair (D18-1a). Let us transform 3: into 
an L-equivalent sentence k by rearranging the conjunctive components 
in the following way. First, we place the components with in,, i.e., ‘a,’, 
then those with in, and so forth, finally those with iny. For any in, 
there are m basic sentences as components, one for each of the r pr; we 
arrange these components according to increasing subscripts of the pr. 
Thus, k has the form k,.k,..... ky, where k, (m = 1 to N) is the sub- 
conjunction with in,. The first component in k, is either pr,in, or its nega- 
tion, the second either prin, or its negation, and so on. Hence, k, can be 
abbreviated by a Q-sentence with in, (T32-3b). Thus Ł is transformed into 
a conjunction h of N Q-sentences, one for each of the N in in Q7. We call h 
the Q-form of 3; (Dia). In this form, any in; occurs only once; but a 
Q-predicate may have any number m of occurrences (o S$ m < N). 

In LZ, we can transform the 3 in a similar way; but here the Q-form 
is not as important as in 2y. In &, any 3; is not a conjunction but an 
infinite class of basic sentences, one from each basic pair (D18-1b). Let 
Ra be the subclass of 8; containing the basic sentences with in,. Then 
&, is finite; it contains + sentences, one for each of the m pr. It contains 
either prin, or its negation, either prin, or its negation, etc., finally 
either pr,in, or its negation. Thus &, is L-equivalent to a conjunction 
kn of its elements, arranged in the order of increasing subscripts of the pr. 


Hence, k, can be abbreviated by a Q-sentence with in,. Thus, 3; is L> 


equivalent to an infinite class of Q-sentences, one for each in of the infinite 
Sequence of in in 25. We call this class of Q-sentences the Q-form of 3; 
in £ (Dib). 

[The Q-forms of 3 both in finite and in infinite systems belong, strictly 
speaking, to enlarged systems containing the Q-predicates; see the re- 
marks preceding D33-1.] 
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D34-1. 
+a. The Q-form of 3; in % =p; that conjunction of N Q-sentences, one 
for each in in Qj, arranged in the order of i increasing See of 
the in, which is L-equivalent to 3;. 
b. The Q-form of 3; in &% = nps that class of Q-sentences, one for each 
in in 85, which is L-equivalent to 3;. 


We have seen earlier that the Q-predicates constitute a division 
(T31-2d). Any 3; in &y, as we see from its Q-form, specifies for every in- 
dividual which of the Q-properties it has. Hence, 3; is L-equivalent to 
an individual distribution for all in (T1); its Q-form differs from an in- 
dividual distribution at most in the order of components. 


+T34-1, Any 3; in £% is L-equivalent to an individual distribution for 
all in in gẹ with respect to the Q-division. (From Dra, D26-6a, T31-2d.) 


+D34-2. The mth Q-number in 3; in 2” =p; the number of full sen- 
tences of ‘Qm’ in the Q-form of 3:. (‘Qn’ is that Q-predicate which corre- 
sponds to the mth Q-predicate-expression in their lexicographical order; 
see D31-1a and the examples A31-1 for 23.) 

In other words, the mth Q-number in 3; is the number of those in- 
dividuals which in 8; have the property Qn. Since there are x Q-proper- 
ties, any 3; determines x Q-numbers. In 3; in &j, these Q-numbers, say, 
Na, N.,..., N,, are finite; their sum is N. In 3; in &%, a Q-number is 
finite or infinite; their sum is a, (denumerably infinite, D4o-8a); hence 
at least one of them is ao. 

We shall now show that isomorphism of 3 means the same as identity 
of the Q-numbers, both in finite systems (T3) and in the infinite sys- 
tem (T4). 


+T34-3. Let 3; and 3: be two Zi in 2. Let the Q- numbers, of 3; be 
N,, N.,...,N, and those of 3; Ny N,, W N, 3: and 3; are iso- 
morphic if and only if they have the same EAAS (i.e., for every m 
from 1 to x, Nm = NW). 


Proof. Let i and i’ be the Q-forms of 3; and 3), respectively. 1. Let Bi and 3; 
be isomorphic. Then there is an in-correlation C in wh such that 8; is constructed 
from 8; by C (D26-3a, D26-4b), and analogously i’ from i. For every m from 
1 to x, the Nm full sentences of ‘Qn’ in i contain Nm different in, and the full 
sentences of ‘Qm’ in i’ contain the C-correlates of those in. Their number must 
likewise be Nm, since C is one-one. On the other hand, the number of the latter 
full sentences is Nm. Therefore, Nm = Na. 2. For every m from 1 to x, let Nm = 
Ni. Now we construct a correlation C in the following way. For every m, we 
oe the Nm in which occur with ‘Qn’ in 7 in an arbitrary way with the 

Nj, in which occur with ‘Qn’ in i’. This is possible because Vn = Nf Since 
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no in occurs with more than one Q-predicate in 7, and likewise in i’, C is one-one. 
Since every in of &% occurs in i and likewise in 7’ , C is an in-correlation for &%. 
i’ differs from C(i) at most in the order of the components. Hence, 3; is con- 
structed from 3; by C. Therefore, 3/ and 3; are isomorphic. 


T34-4, Let 3; and 3: be two 3 in 2%. Let the Q-numbers of 3; be 1,, Ua, 

. , tp and those of 3; u, ts, Es u.. 3: and 3; are isomorphic if and 

only if they have the same Q-numbers (i.e., for every m from 1 to «K, 
Um = tn). 

The proof is analogous to that of T3 and even simpler, because here 3; and 


Bi are classes and hence no analogue of the complication connected with the 
order of conjunctive components occurs. 


D34-4, The Q-form of Str; in Ly =p; the expression formed from Str; 
by replacing every 3; occurring as a disjunctive component in Gtr; by 
its Q-form. 

We found that the Q-form of a 3 corresponds to an individual distribu- 
tion for all in with respect to the Q-division. Therefore, the Q-form of 
any Str; corresponds to a statistical distribution (T6); they differ at most 
in the order of conjunctive and disjunctive components. 


+T34-6. Any Str; in Q% is L-equivalent to a statistical distribution for 
all in in 8 with respect to the Q-division. (From Tr, D27-1, D26-6b, c.) 


Any Gtr, in Qy states those features of the domain of individuals of 
2v which are common to the isomorphic 3 belonging to Gtr,. What 
isomorphic 3 in £% have in common is the ordered x-tuple of Q-numbers 
(T3). Hence, any Str; in Q7 states no more and no less than a certain set 
of x O-numbers. Any Str; determines uniquely the Q-numbers, as seen 
from its Q-form. And, conversely, any arbitrary ordered x-tuple of num- 
bers whose sum is V determines, if taken as Q-numbers, uniquely a cer- 
tain Str, in Ly; we find Str; by constructing the disjunction of all those 3 
in @y (in the lexicographical order) which have the given numbers as Q- 
numbers. We have earlier talked loosely of the structure described by a 
Str. We may now give a more precise meaning to the term ‘structure’. 
We might say, if we wish to, that, with respect to systems £", the struc- 
ture of 8: is the sequence (ordered x-tuple) of the Q-numbers of Bi. 

It is easily seen that the Q-numbers do indeed determine all structural 
features of the pr. The structure of a pr of degree one or of the property 
designated by it is its cardinal number; all other structural properties 
(for instance, universality or emptiness) are determined by the cardinal 
number. The cardinal number #; of any pr,, for a given 3; or Str;, is de- 
termined by the Q-numbers in the following way. pr, is L-equivalent to a 
disjunction of certain Q-predicates (T33-1a); since any two Q-predicates 


eal 
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are L-exclusive (T31-2a), n; is the sum of the Q-numbers of these Q-predi- 
cates. Likewise, the cardinal number of any molecular predicate expres- 
sion and any molecular predicate is determined by the Q-numbers. (This 
follows analogously from T31-2f.) 

We have defined the structure-descriptions (Str) only for finite systems. 
That a structure is determined by the x Q-numbers holds also for &% 
(because of T4). However, in general, a structure in £5 is not expressible 
by a sentence in this system {%, itself. As mentioned earlier, we may regard 
classes of isomorphic 3 as representations of structures in t. However, 
they are classes of classes of sentences, and hence much more complex 
than sentences; we shall not use them within our theory. If a structure 
in &% fulfils a certain special condition, it is expressible by a sentence. 
This condition can easily be stated with the help of the Q-numbers. 


We can find the condition in the following way. In 2, as in any finite system, 
we can easily construct a sentence jn which says that a given property, ex- 
pressed by any matrix of the system in question, has a certain finite cardinal 
number. This may be done either in the customary way with the help of existen- 
tial quantifiers and ‘=’ (see Hilbert and Bernays [Grundlagen], I, 174) or, 
in Qy, also in the form of a disjunction as in the statistical distributions and 
the Str. However, there is no sentence in fœ which says that the cardinal num- 
ber of a given factual property is infinite or that it is finite; this can only be ex- 
pressed by means of attribute variables, which do not occur in our systems 2 
(see the similar remark concerning the domain of individuals in the paragraph 
in small print at the end of § 20). [There are sentences in our systems from 
which it follows that the cardinal number of a certain property is finite or that 
it is infinite, which, however, say more than this. For instance, there is a sen- 
tence j which says that at most five individuals have the property M. It ob- 
viously follows from j that M is finite. And, in læ, it follows from j that ~M 
is infinite, because it can be seen from the semantical rules of o that the num- 
ber of individuals is infinite, although this cannot be expressed by a sentence 
in Goo.] As we have seen, any structure in $% is determined by the Q-numbers. 
At least one Q-number is infinite; it is ao (D40-8a), the smallest infinite cardinal 
number, because the number of individuals in æ is ao. Now let us consider a 
structure of which all Q-numbers except one are finite. Then there is a sentence, 
which attributes to the x — r finite Q-properties those finite cardinal numbers 
which they have in that structure. Since in &% at least one Q-number is a, it 
follows from i that the one remaining Q-property has the cardinal number ao 
(though this alone is not expressible in 1%). Thus, 7 states, explicitly or implicit- 
ly, all Q-numbers and hence describes the structure in question. We can easily 
see that this is only possible if all Q-numbers except one are finite. For, consider 
a structure of which two Q-numbers are infinite and the others finite. We can 
again construct a sentence j which attributes to these x — 2 finite Q-properties 
their finite Q-numbers. It follows from j in &% that at least one of the other two 
Q-properties is infinite; but it does not follow that both are infinite. The struc- 
ture in question cannot be described by a sentence in %%. However, we can of 
course describe this and any other structure in our metalanguage (by means 
of the word ‘infinite’ or the sign ‘a,.’) and make statements about it. 
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§ 35. Some Numbers Connected with the Systems \’” 


The numbers of the 3 and of the Gtr in any system $% are stated (Tx) in 
terms of N, r, and x (§ 31). A table (T2) gives the values of these and other 
numbers for some small systems LẸ. For any 3; in 2%, the number of those 8 
which are isomorphic to & is given as a function of the Q-numbers of 3; (T4); 
this number will be used in inductive logic. 

We have earlier (in § 29) introduced some numbers connected with the 
systems £: 6 (the number of the atomic sentences), ¢ (the 3), p (the N), 
T (in y, the Str; in Qe, the classes of isomorphic 3). We found that 
¢ = 2 and p = 2° (T29-1). These numbers apply, of course, also to the 
systems £", since these are merely special cases of systems £. For the 
systems £", we have further introduced the numbers + (the pr) and x 
(the Q-predicates), where x = 2” (T31-1). 

We shall now state some theorems concerning these numbers with re- 
spect to the systems £”. Important for our system of inductive logic is only 
Trd, stating the number of the Gtt. 


T35-1. The following holds for any system &¥. 
a. B= TN. 
b. t=. 
Proof. § = # (T29-1a), = 7" (a), = (0) = © (T31-1). We can also 
obtain the theorem directly with the help of T40-31c, because the Q-forms of 


the 8 are the individual distributions of the N individuals among the x Q- 
properties. 


ce p= 2 = 2%”. (From T29-1b, (b).) 
+d. r= ("PT ie, Ge. 
(For these mathematical notations see D4o-1 and 2.) 


Proof. From T40-33b, since the Gtr, as their Q-forms show, correspond to 
the statistical distributions for the W in with respect to the Q-division. 


The subsequent éable T2 gives the values of 8, 7, t, and p for some 
small systems £% with + = x to 3 and N = 1 to 10 (based on Tr; or on 
Tra and d, T29-1a and b). The exact values are not important for our pur- 
poses. The table is intended merely to give a general impression of the or- 
der of magnitude of the values and, in particular, to show that ¢ and, even 
more, p increase at an enormous rate with increasing m and N. p is also 
the number of propositions expressible by sentences in Qy (see remark 
following T29-2). This number is finite, but, as the table shows, it takes 
immense values even for these very poor language systems; we see, for 
instance, that to write down the number p for X, in the ordinary decimal 
notation would take more than three hundred million digits. 
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T35-2. The numbers of atomic sentences (BY, of Str (7), of 3 (©), and 


of R (p) for some small systems Q7. 


x 
a. yi T = I,K= 2. 


N B T t p 
I I 2 2 4 
2 2 3 4 16 
3 3 4 8 256 
4 4 5 16 65 536 
5 5 6 32 4 294 967 296 
6 6 7 64 1.85X 10° 
7 i 8 128 3-42X108 
8 8 9 256 1.17X107 
9 9 10 512 1.37X10' 

10 10 1 1024 1.88X 108 

b. ir =a Kage 

N B T $ p 
1 2 4 4 16 
2 4 10 16 65 536 
3 6 20 64 1.85X 10° 
4 8 35 256 1.17X 100 
5 10 56 1024 1.88X 103% 
6 12 84 4.006 1.25 X10! 233 
7 14 120 16 384 2.43 X 10493? 
8 16 165 65 536 3-52X 10° 729 
9 18 220 262 144 1.53 X 1078 98 

10 20 286 1 048 576 5.52X 1035 672 

Sei 
c Wyi a = 3,«K = 8. 

N B z t P 
I 3 8 8 256 
2 6 36 64 1.85X109 
3 9 120 512 1.37X 1054 
4 12 330 4096 1.25X 10133 
5 15 792 32 768 5-92X 109 864 
6 18 1716 262 144 1.53 X107 98 
7 ar 3 432 2097 152 3-05 X 10%! 345 
8 24 6 435 16 777 216 6.5 X10 50 763 
9 27 II 440 134 217 728 C. 101-04%107 

10 30 19 448 1073 741 824 C. 103:33X10t 


T35-3. Let m Q-predicates in L% be given (o S m < x). Let 7m be the 
number of those Str in which these m Q-predicates (but not necessarily 


only these) are empty. Then 
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= (Nt«-m-1r ; N + « — m —1)! 
Ta = «—m— DFAS kemr 


Proof. From T40-33a, in analogy to T1d, since the Str described correspond 


to the statistically different kinds of distributions of the N individuals among 
the remaining x — m Q-properties. 


T4 will be of importance for inductive logic. It states the number ¢; of 
those 3 in 2} which are isomorphic to a given 3;, as a function of the 
Q-numbers. The subsequent analogue for 2%, (T7) is less important. 


+T35-4. Let the Q-numbers for 3; in l% be Na, N.,..., N, Let Str; 
be the one Str corresponding to 3; (D27-1a). (Hence, Str, is characterized 
by the given Q-numbers.) Let ¢; be the number of those 3 which belong 
to Str,; in other words, those which are isomorphic to 3;. Then 


$< = wort 


(From T34-1, T34-3, T4o-32b.) 
T6 and T7 are the analogues for 2%, to Tıd and T4, respectively. They 
will not be used in inductive logic (see remark preceding T29-4). 


T35-6. In £7, 7 = ao. 


Proof. Every Str corresponds to an assignment of one of the values ao, 0, 
1, 2,... to each of the x Q-properties in such a way that the value ao is as- 
signed at least once. Therefore, 7, the number of the Str, is a finite multiple 
of the number of possibilities for assigning one of the values a, 0, 1, 2,... to 
each of x — 1 Q-properties. The number of the values is @»; hence the number 
of the possibilities described is ag~" (T40-31b), = ao (T40-26e). Therefore, 
T = a (T40-26b). 


736-7. Let the Q-numbers for 3; in &% be u;, ta, .. . , uy (Each of these 
numbers is finite or as; at least one is as.) Let ¢; be the number of those 3 
which are isomorphic to 3:. 

a. If one Q-number is a, and all others are o, then ¢; = 1. 

b. If exactly one Q-number is a, and at least one other Q-number is 
positive, then [; = ao. 

If more than one Q-number is ao, then [; = a. 


Proof for (a), (b), (c). The 3 which are isomorphic to 3; represent those 
individual distributions of the a» individuals among the x Q-properties in 
which the Q-properties have the given cardinal numbers Uy, -o y Me. Let 
n, of these « Q-numbers be o, # of them finite and positive, and n; of them 
a (m+n +m =K; 150, Sk; hence, oS m +n x — 1). Suppose 
Um is finite and positive; then the number of possibilities for selecting tm from 
the a» individuals for Qn is ao (T40-32f). After these um individuals have 
been removed, there remain still ao — tim = ae individuals (T40-26a). There- 
fore, the number of possibilities for each of the s finite and positive Q-numbers 
is the same as for tm, hence ao. Thus, the number of possibilities for these m2 


c 


Re 
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Q-properties together is af. This is also the number of possibilities for the 
na + m finite Q-properties together, because for the m empty Q-properties 
there is obviously only one possibility. 

1. For (a) and (b). Here, n, = x. For any distribution of individuals among 
the x — 1 finite Q-properties, there is just one possibility for the infinite Q-prop- 
erty; it must contain all remaining individuals. Thus, the number of distribu- 
tions is here the same as that for the m + 7 finite Q-properties, which we found 
to be a%. For (a), na = 0; hence, ¢; = a = 1 (T40-26d). For (b), n: > 0; 
hence f; = a? = ao (T40-26e). 

2. For (c). Here n, > 1. We take the m; infinite Q-properties in any order. 
For the first of them, say, Qp, the number of possibilities is the number of in- 
dividual distributions of ae individuals among the two properties Q, and ~Q, 
in such a way that each of them has ao individuals, hence 2% (T40-3 2g), = 
a, (D4o-8b). The same holds for each of the other infinite Q-properties except 
the last. Hence, the number of possibilities for ms — 1 of the m, infinite Q-prop- 
erties is a? = a, (T40-26e). The number of possibilities for the finite Q-prop- 
erties is, as we found earlier, either 1 or ao. If the distribution among all Q- 
properties except the last infinite Q-property is given, then there is only one 
possibility for the last infinite Q-property; it must contain all remaining indi- 
viduals, Therefore, the number ¢; in this case (c) is either 1a; or aoa, hence ar 
(T40-26c). 


§ 37. Simple Laws 


A sentence consisting of a universal quantifier and a matrix as its scope 
which does not contain a quantifier is called a simple law (Dr); if, moreover, 
neither ‘=’ nor any in occurs in it, the sentence is called an unrestricted simple 
law (Daa). Let } be an unrestricted simple law in Q". Then / asserts that a cer- 
tain molecular property, say, Mz, is empty. If M, is a Q-property, l is called an 
unrestricted Q-law (D4b). Let the width (§ 32) of M, be w > o; then M, is 
L-equivalent to a disjunction of w Q-properties. Therefore, } says in this case 
that these w Q-propérties are empty, and hence can be transformed into a con- 
junction of w unrestricted Q-laws (T2b). } is then said to have the strength w 


(D6a). 


We use the term ‘law’ here in the sense of ‘natural law’ or ‘physical law’, 
hence for universal sentences and chiefly for factual ones. (The so-called 
laws of logic are here rather called principles or theorems of logic.) In dis- 
cussions of inductive logic, laws have always had a prominent place, and 
some authors have even gone so far as to define induction as a kind of non- 
deductive inference leading to laws. We conceive inductive logic in a 
much wider sense so that the hypothesis obtained or judged in induction 
may have any form whatever. We regard the case of a universal hypothesis 
as merely a special kind of induction, called universal induction. However, 
it is indeed a case of great importance. In preparation for the later treat- 
ment of laws in inductive logic, we shall here explain some of their prop- 


erties in deductive logic (§§ 37 and 38). 
Let Z be a universal sentence (ir) (M). We shall chiefly deal with the 
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simplest case where the scope M; is nongeneral, that is, does not contain 
a quantifier. In this case, we-shall call 7 a simple law (Dx). 

Let / be a simple law. We shall define two special kinds; they are mutu- 
ally exclusive but do not exhaust all possibilities. r. If the scope M; con- 
tains neither ‘=’ nor any in, then / speaks in a purely general way about 
the individuals of the system in question without referring to any par- 
ticular individual. In this case, we shall call J an unrestricted simple law 
(D2a). [The exclusion of ‘=’ in addition to that of the in is inessential; 
for, if no in occurs, ixis the only individual sign occurring; hence, ‘=’ can 
occur only in the context i, = ix, which is L-interchangeable with ‘t’ 
(T24-2a, T23-4c).] 2. Sometimes we wish to attribute a certain property 
M, not to all individuals without restriction, but to all individuals with 
the exclusion of some specified individuals, leaving it open whether or not 
these specified individuals have the property M. We can do this by a for- 
mulation like the following: ‘(x)(x < a.x <b D Mx)’, where a and b 
are the individuals excluded; hence by ‘(x)(x = a V x = b V Mx)’. 
Simple laws of this kind will be called restricted simple laws (D2b). 


D37-1. l is a simple law in a system & =p; l is a sentence of the form 
(i) (M), where M; is nongeneral. 


D37-2. Let / be a simple law (i:)(M;) in Q. 

a. lisan unrestricted simple law = ps M; contains neither ‘=’ nor any in. 

b. Lis a restricted simple law =p; M; is M; V My, where M, is a dis- 
junction of n components of the form i, = in with n distinct in 
(n = 1; in &y,n < N) and M, contains neither ‘=’ nor any in. 


The discussion in this and the next section concerns simple laws in the 
systems {". We shall see that here the use of the Q-predicates will make 
the analysis of the deductive properties of laws very simple and effective; 
the same method will later be of great help in inductive logic. 

Let / be an unrestricted simple law in $7, say, (ix)(M.), where i; is ‘x’. 
Let ‘M,’ be defined as a molecular predicate in such a way that ‘Mx’ is 
L-equivalent to ~M, and hence ‘~M. 1’ is L-equivalent to M;. Then / 
can be transformed into ‘(x)(~M,a)’ or ‘~(Gx)(M 1x)’. This shows that 
every unrestricted simple law says that a certain molecular property, 
here M,, is empty. 


Example. Suppose that ‘Swan’ and ‘White’ are defined in some way or other 
as molecular predicates in £". Then ‘all swans are white’ can be formulated as 
an unrestricted simple law in *: ‘(x) (Swanx D White x)’, that is, ‘(x) 
(~Swan x V White x)’. Now we define ‘M? in the way described, transforming 
the negation of the scope in an obvious way: ‘M,x’ for ‘Swan x «~White x’. 
Thus, ‘M,’ designates the property Non-White-Swan. Hence, the law says 
that this property is empty, in other words, that there are no non-white swans. 


= ot Seen 
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Let the logical width (D32-1a) of ‘M,’ be w. Then ‘M,’ is L-empty if 
and only if w = o (T32-2a). Otherwise, w = 1, and ‘My is L-equivalent 
to a disjunction of w Q-predicates, say, ‘Qn,’,..~, ‘Qmp’. Then, J is L- 
equivalent to ‘(x)[~@Qn,.V...VQnw)x]’, hence to ‘(x)[~(Q,0V... 
V Qn)’, hence to ‘(x)[~Qnct+...+~QneX]’ (T21-5f(2)), hence to 
(a)(~Qn,e) « . .  « (%)(~Qnwx)’ (T22-9k). Thus we have here a con- 
junction of w components, each of which is a law whose scope is the nega- 
tion of a Q-matrix. We shall call a law that has this form, or is L-equiva- 
lent to a sentence of this form, an unrestricted Q-law (Dab). It is a law 
in which the property declared empty is a Q-property. Thus we have ob- 
tained the result that, if Zis not L-false and hence ‘M,’ has a width w = 1, 
then there is a unique set of w Q-predicates such that / can be transformed 
into a conjunction of w unrestricted Q-laws with these Q-predicates. Ob- 
viously, the greater the number w, the more is asserted by the law /. 
Therefore, we shall call w the (logical) strength of l (D6a). The relative 
width of ‘M, is w/x (D32-1b). This-we shall call the relative (logical) 
strength of the law 1 (D6b). Each of the w Q-predicates occurring in the 
above transformation of } is declared empty by one of the Q-laws occur- 
ring as conjunctive components. Therefore we shall say that these w Q- 
predicates, and the Q-properties designated by them, are excluded by the 
law l (D5). Thus we lay down the following definitions. 

+D87-4, For &. 

a, lisa Q-law =p; lis a simple law, and the subdisjunction of the scope 
which contains those disjunctive components which are free of ‘= 
is L-equivalent to the negation of a Q-matrix. 

b. lis an unrestricted Q-law =p; l is an unrestricted simple law and a 
Q-law. 

c. lis a restricted Q-law =p; lisa Peat simple law and a Q-law. 

D37-5. Let / be a simple law in &". The Q-predicate A; (and the Q-prop- 
erty designated by it) is excluded by! = ps l L-implies a Q-law containing 4. 

D37-6, Let / be a simple law in 2". 
+a. The (logical) strength of l = ps the number of Q-predicates excluded 

by L 

b. The relative (logical) strength of l =p: w/x, where w is the strength 
ofl. 

The following theorems hold on the basis of the definitions of this sec- 
tion and the theorems on logical width (§ 32), according to our previous 
explanations. 

T37-1. Let / be a simple law in %7 with the strength w and Henc the 
relative strength q = w/k. 


1 
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a. Lis L-true if and only if w = o and hence q = o. 

b. Lis L-false if and only if w = « and hence q = r. 

c. lis factual if and only if o < w < « and hence o < q <1. (From 
(a), (b).) 

T37-2. Let / be a simple law in Q7 with the strength w. 

a. lis a Q-law if and only if w = r. 

+b. If w > 1, then / is L-equivalent to a conjunction of w Q-laws with 

w distinct Q-predicates. 


T37-3. Let ‘M, be a molecular predicate with the width w. 

a. ‘(x)(~M,x)’ is an unrestricted simple law with the strength w. 

b. ‘(x)(...V ~M,x)’, where a disjunction of n =-matrices with ‘x’ 
and n distinct in (» = 1) stands at the place of ‘...’, is a restricted 
simple law with the strength w. (From D32-1a, D4, Ds, D6.) 

137-4, Let / and I! be simple laws in & with the strengths w and w’, 

respectively. 

a. If}? = V (but not only in this case), w = w’, (From Tr, T2.) 

b. If D V (but not only in this case), w = w’, (From Tr, T2.) 

c. If land /’ are unrestricted and }} D I’ but not H= l’; then w > w. 
(From Tx, T2.) 

d. If / and I’ are unrestricted and l is a Q-law and }} D /’, then I’ is 
either L-equivalent to / or L-true. (From Taa, (c), Tra.) 


T4d says that, if / is an unrestricted Q-law, then there is no factual un- 
restricted simple law weaker than /. 


§ 38. Simple Laws of Conditional Form 


A law | of the form ‘(x) (Ma D M’x)’ gives rise toa division of all individuals 
into four kinds (here represented by a diagram). The first of these kinds is 
designated by ‘M .~M”, abbreviated by ‘My’; this is the property declared 
empty by l. We distinguish between those logical properties of 1 which are 
invariant with respect to L-equivalent transformations and those which are not. 

Most laws in science have conditional form, for example: ‘for every x 
(thing or space-time point or the like), if x fulfils such and such conditions, 
then such and such is the case with x’, Laws of this kind may be formu- 
lated as universal conditional sentences. 

Let / be an unrestricted simple law of this form in g7, say, (i) (M; D 
M;). We can abbreviate M; by a full matrix of a molecular predicate, say, 
of ‘M’; likewise M;, with ‘M”. Thus, } becomes ‘(@)(Mx > M'x)’. (For 
instance, if ‘M’ means Swan and ‘M” White, we have the example of 
§ 37.) Obviously, 7 can be transformed into the following L-equivalent 
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forms: ‘(x)(~MxV M’x)’, ‘(«)[~( .«~M")a]’, and, if we define ‘M, 
as abbreviation for ‘M . ~M”, into ‘(x)(~M,a)’. Thus we see, as in our 
discussion in the preceding section, that / says that the property M,, that 
is, the property of being M but not M’ (Non-White Swan) is empty. 
And here again, if the width of ‘M,’ is w > o, then ‘M,’ is L-equivalent 
to a disjunction of w Q-predicates; and these w Q-predicates and no others 
are excluded by /; hence / has the strength w. 

In a law of the form described, the two predicates ‘M’ and ‘M” are 
usually factual and, moreover, logically independent of each other (this 
means that it is not the case that either of them or its negation L-implies 
the other one or its negation). In this case, the two predicates give rise to 
a division of all individuals into four kinds. Let us designate these kinds 
by four molecular predicates, defined in the following way (‘M,’ is the 
same as above): 

‘M? for ‘M. ~M”, 

‘M? for ‘M. M”, 

‘M3 for ‘~M . M”, 

‘M; for ‘~M ~M”. 

Under the assumptions made concerning ‘M’ and ‘M”, we see that the 
four predicates ‘M, (p = 1 to 4) are L-disjunct, that any two of them 
are L-exclusive, that each of them is factual, and hence that they form 
a division (in analogy to T31-2b, a, c, d). 


M’ -M' 
(white) (Non-White) 
M 
(Swan) 
' f 
l i 
: i 
M i : 
(Maa) on eae 
(Non-Swan) ite Non-Swan 
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i 
' 
t 
I 
a 
i 
' 
t 
i 
‘ 
i 


ee a 


This division is represented by the accompanying diagram, references 
to the previous example are added. The whole rectangle represents the 
domain of individuals of the system &” in question. It is divided into four 
parts by the four properties M., Ma, M;, and M,. The thirty-two small 
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Squares separated by dotted lines represent the Q-properties (we have 
arbitrarily chosen x = 32, and have divided this number into four parts 
in an arbitrary way). Let the widths of ‘M,’, ‘ 2, M}, ‘M; be wy, w, 
Ws, Wy respectively (in the diagram: 5, 3, 9, 15); hence, w, + w, + w, + 
w, = x. The shaded area represents the property M, which is declared 
empty by the law /; hence the w, (five) small squares in this area corre- 
spond to the w, (five) Q-properties excluded by /. J can be transformed into 
a conjunction of w, Q-laws; each of them declares one of the w: small 
shaded squares to be empty. We have called w, the strength of l; this seems 
natural, for, the greater the number of Q-laws whose joint assertion is /, 
the more is said by /. 

For many problems concerning the law l, the distinction between M, 
and M, is of little or no interest. (For instance, when we intend to test the 
law by the observation of individuals, then it is irrelevant whether a non- 
swan found is white or non-white.) Therefore, in our later discussion in 
inductive logic of laws of a form like l, we shall sometimes use a simplified 
division of only three kinds designated by ‘M,’, ‘M,’, and ‘M, 3,4, Where 
the latter predicate is defined by ‘M, V M}, hence by ‘~M’; its width 
is w4 = w, + w, 

At the beginning of this section we have mentioned several L-equiva- 
lent forms of the law / in terms of the predicates ‘M’ and ‘M”. Of L- 
equivalent forms which use instead some of the four predicates of the 
division, ‘(x)(~M,zx)’ has already been mentioned; other simple forms 
of this kind are, for example, ‘(x)(M.x V M. se V Myx)’, “(x)(MixV 
M.x D Mx)’, and ‘(x)(M,x V Myx D Myx)’. 

When we carry out a logical analysis of a law, say, J, and, as a result, 
ascribe to it certain logical (that is, L-semantical) properties either in 
deductive or in inductive logic, we must distinguish between those prop- 
erties which are invariant with respect to L-equivalent transformation 
and those which are not. The invariant properties may, in a certain sense, 
be called properties of the content of / (if by ‘content’ we mean, without 
giving an exact definition, something which L-equivalent sentences have in 
common; for a possibility of explication see § 73) or properties of the 
proposition expressed by /. On the other hand, the noninvariant prop- 
erties of J are properties of the formulation rather than the content. Among 
the invariant properties of J are the L-concepts (for example, the proper- 
ties of being factual, or L-true, or L-false, or L-implying such and such 
other sentences, or being an L-implicate of such and such other sen- 
tences, and the like); further, in inductive logic, the properties connected 
with the degree of confirmation of / or of a certain instance of 1. Now let 
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us examine the properties of } discussed earlier in this section, It is an 
invariant property of / that just the property M,, which is the same as 
the property of being M but not M’, is declared to be empty; likewise, 
that certain Q-predicates are excluded, that their number is w, (in the 
diagram: five), and hence that / has the strength w, (five). On the other 
hand, it belongs to the noninvariant properties of / dependent upon the 
formulation that the property referred to in the antecedent of the scope 
is M and that referred to in the consequent is M’; that these properties 
have certain widths (in the diagram: eight and twelve); that their con- 
junction is the property M, with the width w, (in the diagram: three), 
and the like. However, M and M’ determine also an invariant property 
of J, viz., that the conjunction of the first and the negation of the second 
is the property M,. To sum it up, all invariances of / are based on this one 
point, the emptiness of M,. 

In our later discussions of inductive logic, we shall find it important 
to distinguish two kinds of problems with respect to the situation just 
explained: (i) problems referring to a given law J; (ii) problems referring 
to a pair of properties M, M’. If the pair of properties is given, the corre- 
sponding law ‘(x)(Ma D M’x)’ is uniquely determined. On the other 
hand, if the content of the law is given, the pair of properties is not unique- 
ly determined; there are many different pairs of properties yielding the 
same content of the law. Thus, for example, the properties non-M’, non-M 
lead to the formulation ‘(x)(~M’x  ~Mx)’, which is L-equivalent to 
the one above (by transposition, T21-5h(1)). 

Let us calculate the number p of different pairs of properties which yield laws 
L-equivalent to a given factual unrestricted simple law j in &" with the strength 
w. (The result will not be used later but is merely intended to give a more pre- 
cise picture of the situation.) We are referring to pairs of properties, not to 
pairs of matrices; the number of the latter pairs, in other words, of universal 
conditional sentences L-equivalent to j, is obviously infinite. Suppose the sen- 
tences (u) (M: D W;) and (a) (M; D Mj) are both L-equivalent to j. If here 
M; is L-equivalent to Mi (D25-1g), then we say that the two matrices express 
the same property; if, in addition, M; is L-equivalent to M;, we say that the 
two sentences, though different, correspond to the same pair of properties, 
and hence we count them only as one for the number of pairs of properties. 

j is supposed to be factual. Therefore, o < w < «x (T37-1c). Let &, be the 
class of those w Q-predicates which are excluded by j, and let ‘M,’ be defined 
by the disjunction of these Q-predicates. Then ‘My’ has the width w and is 
factual. Now any pair of properties M, M’ yields a law L-equivalent to 7 if 
and only if ‘M . ~M” is L-equivalent to ‘M,’. What we are searching for is the 
number of pairs of this kind. We find it as follows. We divide the class of all 
those Q-predicates which are not excluded by j—their number is x — w 2 1— 
in an arbitrary way into three mutually exclusive subclasses ®2, &;, and &,, 
each of which may be empty. We define ‘M’ by the disjunction of the Q-predi- 
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cates of R: U Ra and ‘M” by the disjunction of those of R2 K, (if this 
class is empty, we define ‘M” by any L-empty predicate expression, eg., 
‘P «~P’). Then ‘M . ~M” is L-equivalent to ‘M,’, and hence the pair of the 
properties M, M” is one of those we are looking for. And, inversely, if any pair 
of properties M, M’ is such that ‘M . ~M” is L-equivalent to ‘M,’, then this 
pair can be obtained in the way described from one of the tripartitions of the 
k — w Q-predicates. It is easily seen that two pairs of properties are different 
if and only if they are based on two different tripartitions (regarded as ordered 
triples of subclasses). Therefore, the number $ of pairs of properties searched 
for is the same as the number of the tripartitions, i.e., the individual distribu- 
tions of x — w elements among three classes; hence p = 3°” (T40-31a). This 
number includes two extreme cases, viz., the pair in which M is L-universal 
(in this case M” is the negation of M,) and the pair in which M’ is L-empty (in 
this case M is the same as M,). [If we wish to count only those pairs in which 
both properties are factual (represented by universal conditional sentences in 
which both the antecedent and the consequent matrices are factual), the num- | 
ber is 3*"” — 2.] Thus we see that $ is the greater, the smaller the strength w | 
of the law j. ! 
Examples. In the example of the above diagram we had x = 32, w, = 5. 
Hence p = 377 = 7.6193 X 10". Suppose that in the given law j both com- 
ponents of the conditional are basic matrices; for instance, let j be ‘(x)(P,x D 
Px)’. Then M, is a conjunction of two basic properties (in the example, ‘M? 
is L-equivalent to ‘P, «~P,’), and hence w = 27-7 (T32-4a(2)) = x/4. Hence, 
K — w = 3k/4,and p = 33*/4, (For instance, in 24 we have x = 16; hence, w = 4 
and p = 3 = 531,441.) Here a few examples of pairs of properties for the sen- 
tence j mentioned above, represented by pairs of molecular predicate expres- 
sions: ‘Py’, ‘Px; ‘~ 2’, ‘SPP; ‘PV P?, ‘P?; ‘P.V P?, PV ~P’; ‘Pi, 
‘PV~Pr’; ‘Py’, ‘Pa\ (~P; ~P); PIN (P, aB), PNAP. 


§ 40. Some Mathematical Definitions and Theorems 


Some mathematical definitions and theorems are listed for later reference. 
The notations defined are: (A) ‘n? (D1), “@)’ (D2), ‘[R] (D3, analogous to 
the preceding), ‘¢()’ (the normal function, D4a), ‘®(u)’ (the probability inte- 
gral, D4b); (B) ‘lim f(n)’ (D6a); (C) ‘ao’, ‘a,’, etc., for infinite cardinal num- 
bers (D8). D. Some theorems of combinatorics (T29-T33) state the number of 
permutations, of possible distributions, and the like. 


In this section we list some mathematical definitions and theorems for 
the convenience of the reader. They are frequently used throughout this 
book, both in deductive logic, in the earlier sections of this chapter, and 
later in inductive logic. Almost all notations here defined are customary; 
the exceptions are D3 and D8. Almost all the theorems are well known. 
Therefore, we omit proofs in most cases. 

‘k’, ‘m’, ‘n’, and ‘p’ are used as variables for natural numbers (o, 1, 2, 
etc.); ‘g’ and ‘r’ (and sometimes ‘n’) as variables for real numbers. 
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A. Some Mathematical Functions 
+D40-1. Recursive definition of the factorial n!. 
a. o! =p I. 


b. (n + 1)! =pin!(n + 1). 
c. If mis a negative integer, n! = œ. (Seldom used.) 


740-1. If m1, 0! =] [(p) (ie, 1X 2X3X.-.. Xn). 


T40-2. Table for the Factorial 
` 
* n! log (n!) n nl log (n!) 
I I o Ir 3.9917 X 107 7.60116 
2 2 0.30103 12 4.7900X 108 8.68034 
3 6 0.77815 13 6.2270 X 109 9.79428 
4 24 1.38021 14 8.7178 X10" 10.94041 
5 120 2.07918 I5 1.3077 X10" 12.11650 
6 ' 720 2.85733 16 2.0923 X 109 13.32062 
7 5040 3: 70243 17 3.5569 X 10" 14. 55107 
8 40 320 4.60552 18 6.4024 X 105 15.80634 
9 362 880 5.55976 19 1.2165X 10" 17.08509 
10 3 628 800 6.55976 20 2.4329X 108 18.38612 


(Here and in the following, ‘log’ is used for common logarithms, i.e., on 
the base 10.) 

This table and the tables for other functions in this section are not in- 
tended for serious statistical work in science. For this purpose more ex- 
tensive tables for these functions—and, moreover, for other functions 
not mentioned here—are given in textbooks on statistical methods and on 
the applications of the calculus of probabilities. Our tables are merely in- 
tended for the convenience of those readers who wish to calculate numeri- 
cal examples in inductive logic. It is of course advisable in any mathemati- 
cal discipline to study concrete examples in order to come to a better un- 
derstanding of the abstract theorems. In inductive logic there is an im- 
portant additional reason. We shall discuss various definitions for con- 
cepts of degree of confirmation or requirements for such definitions. We 
shall see that it is often hardly possible to judge the plausibility of defini- 
tions or requirements, that is, their adequacy for an explication of proba- 
bility,, by merely inspecting the definitions themselves. The judgment is 
rather to be based on an investigation of the consequences to which the 
definitions or requirements lead. Thus, the plausibility of a definition is 
judged by the plausibility of the theorems derived from it; and this in 
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turn can often be judged in the easiest way by studying concrete numeri- 
cal examples. We shall often give such examples; and some readers will 
wish to construct and analyze examples of their own. 

The following two theorems give approximations for the factorial; they 
are convenient and frequently used. These approximations and those 
given in later theorems hold in the sense that the relative numerical error 
is the smaller the more the restricting condition is fulfilled (in T4, the 
larger n is; in Tsa and b, the larger m is in relation to n?, i.e., the larger 
m/n? is), in such a way that the limit of the relative error is o. That one 
approximation is rougher than another means that the error is larger; in 
other words, the qualifying condition must be fulfilled to a higher degree 
in order to reduce the relative error to the same amount. (‘œ is used as 
a sign of approximative equality.) 


T40-4, Stirling’s Theorem. For sufficiently large n, the following 
approximations hold. 
a. m!= yrn n'e” (1 + 1/121). 
(‘T has here its usual mathematical meaning, which has, of course, 
nothing to do with our ‘m’ in § 31. m œ 3.14150; Van = 2.5066; 
e & 2.71828.) 
b. (Rougher approximation.) 
n! Œ amn n" e". 
Even this approximation is already rather good for small n. 
c. log (n) = log \/am + (n + 1/2) log m — n log e. (From (b).) 
(log V/2m = 0.39909; log e = 0.43429.) 
140-5. The following approximations hold if m is sufficiently large in 
relation to n?. 
a. (m+n)! = m! m”(1 + “£2). (From T4.) 
b. (Rougher approximation.) 
(m+n)! = m! m". (From (a).) 
+D40-2. The binomial coefficient (7). 
Let» 2 o. 
a. Form 2 n, (p) = Di mp- 
b. (Seldom used.) For m < n, (™) = o. 
(This function is called binomial coefficient, because the coefficients 


in the binomial theorem Troa have this form. Other customary notations 
ionit: "G2, Ch, ACR) 
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P A ears m 

T40-7. Table for the Binomial Coefficient 4) 

"TOI @M)/@)@O1@)@|@}] ®)@).@ | & 

o I 

I I I 

2 I 2 I 

con as 3 3 1 

4 T 4 6 4 I 

5 I 5 10 10 5 I 

6 I 6 I5 20 15 6 I 

th I 7 21 35 35 2I 7 I 
g. 8 I 8 28 56 70 56 28 8 I 

9 I 9 36 84 126 126 84 36 9 I 
ş 10 1 I0 45 | 120| 210 252 210 120 45 10| I 
ha Ir I II 55 165 | 330 402 462 330 165 55 Ir 
=. 12 I 12 66 220 | 495 792 924 792 495) 220) 66 
j 13 I 13 78 286 715 | 1287 | 1716| 1716 1 287 715 286 
4 14 I 14 gt | 364 |1roor | 2002 | 3003 | 3432 | 3003| 2002| 1oor 
i 15 1 15 105 | 455 | 1365 | 3.003 | 5005 | 6435 | 6435| .5005| 3003 
n 16 I 16 120 560 |1 820 | 4368 | 8008 |11 440 | 12870] 11 440] 8008 
= 17 I 17 136 | 680 | 2380} 6188 | 12376 |19 448 | 24310] 24310] 19 448 
ba) 18 I 18 153 | 816 |3060 | 8568 | 18 564 | 31 824 | 43 758) 48620) 43 758 
p 19 I 19 171 969 | 3 876 | 11 628 | 27 132 | 50388 | 75 582| 92378] 92378 
$ 20 I 20 190 | 1 140 | 4 845 | 15 504 | 38 760 | 77 520 | 125 970| 167 960| 184 756 


The value of (+) for 10 < m < 20, m > 10, is found in the table with 
the help of T8d. For example, (15) = ($) = 816. 


T40-8, 
a. Gia 
b. @) =1 
c (*) =m 
i d. (R) = (nan 
3 e. A = mor). 
; a) EG) = G) 
Ži T40-9. 
; a Do Ge) 
A b2 CRN Cntr) 
: c. 5 IEn »)] = EF”); Gf k < m, itis sufficient to let the sum run 
g from # = 0 to k, because of Dab). 
Ly d. 2 HE] = KENY; the sum runs from # = o to k or k’; it is suf- 
fy ficient to take the smaller of these two. 


= 


T40-10. Let 7, and r, be any real numbers, and » a positive integer. 


yog 
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a. The Binomial Theorem (not to be confused with the Binomial Law, 
T95-1b, which is a theorem on probability based on the Binomial 


Theorem). 
over Elie 


The function defined in D3 will be used very frequently in inductive 
logic. We choose for it a notation similar to that of the binomial coefficient, 
because we shall often state two analogous theorems, one (concerning an 
individual distribution) using this function while the other (concerning 
the corresponding statistical distribution) uses the binomial coefficient 
with the same arguments (see remark at the end of § 92). (Other nota- 
tions: ‘nPn’, pm Y 


+D40-3. For n = o, m = n, 


[l-am 


T40-11. Table for [| 

mV Gl} fl) fl} gl} tl) el) @)} im} oe 3] | i 
o I 

I I I 

2 bg 2 2 

3 I 3 6 6 

4 I 4 12 24 24 

5 I 5 20 60] 120 120} 

6 I 6 30 120 | 360 720) 720) 

7 I 7 42 210 | 840 | 2520) 5 040) 5 040) 

8 I 8 56 336 | 1 680 | 6720| 20 160 40 320] 40 320) 

9 I 9 72 594 | 3.024 |15 120| 60 480| 181 440| 362 880] 362 880| 
10 I 10 go 720 | 5040 | 30 240] 151 200) 604 800}1 814 400|3 628 800/3 628 800 


40-12. [p] = m(m — 1)(m— 2)... (m — n +1); this is a product 
of n descending factors, beginning with m. 


T40-13. 
a. [ope ae 
b. [7] =m. 


c. [i] = ml. 

d. bezel = m! 

e. [p] = (a)n! 

T40-14. For m > n, % = [,,”,]. 


geese estates 
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Trs shows a possible transformation for a quotient of two [ ]-expres- 
sions with the same lower argument. Quotients of this kind will often 
occur. The transformation is sometimes useful if m, — m, < n. 
Tr7 and T18 give approximations, based on Ts. 
T40-17. If m is sufficiently large in relation to n?, the following approxi- 
mations hold. 
a. (1) [K] = (m — wn)" + SSE). 
(2) (Rougher approximation) [77] = (m — n)". 
(3) (Still rougher) [p] = m”. 
b. (1) ["y"] mx + “SE. 
(2) (Rougher approximation) ["*"] = m”. 
T40-18. If m is sufficiently large in relation to n, and to n3, the follow- 
ing approximations hold. 
a. ("2") = R. 
b. (Rougher approximation) [*}"] =m”. 
D40-4, 
a. The Normal Function: 


olu) =ni vse 


T40-15. For m: = m,, Tai 


=a 


(‘m and ‘e’ have here their usual mathematical meanings; see fe- 
mark on T40-4a.) 
b. The Probability Integral: 
&(u) =v: f.¢(r)ar . 


(x) is the probability integral in the form of a cumulative distribution func- 

tion. Sometimes the value of the above integral over ¢ but from —u to u, desig- 
nated by ‘a(u)’, or from o to #, which is a(u)/2, is given in tables. In earlier 
books the following function is used more frequently, likewise under the term 
‘probability integral’: È 
Olu) =p 2 f> e"dr. 
The relation between the three functions is as follows: a(u) = O(u/ 4/2) = 
20(u) — 1; (u) = yı + a(u)]. Unfortunately, the letters ‘0’, ‘$’, and ‘P’ 
are used by various authors for different functions. We follow Cramér ([Sta- 
tistics], p. 557) in the use of ‘¢’ and ‘®’. 


T40-19, 
a. ¢ is an even function, i.e., 6(—u) = ¢(w). 
b. (u) is always positive. 
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c. lim ¢(u) foru— © and u > — œ is o. 


d. = = —udg(u). 


g. (0) = 1/2. 
h. &(—u) = 1 — (u). 


T40-20. Table for the Normal Function $(u) and 
the Probability Integral (u) 


“ olu) P(u) “ ol(u) (u) “ olu) O(u) 
0.0 | 0.399 | 0.500 I.I 0.218 | 0.864 2.4 | 0.0224 0.991 80 
O.I -397 -540 1.2 -194 -885 2.6 -013 6 -995 34 
©.2 | -391| -579 || 1.3 | 171 | .9032 || 2.8 | .00792 -997 44 
0.3 +381 -618 1.4 +150 -9192 3-0 -004 43 -998 65 
0.4 -368 -655 1.5 +130 9332 3.2 -002 38 +999 31 
OVS hr 6352 -691 1.6 III -0452 3.4 „OOI 23 -999 66 
0.6 +333 +726 1.7 „0941 -9554 3-6 +000 61 +999 84 
0.7 +312 +758 1.8 -0790 -9641 3.8 -000 29 -999 928 
0.8 -290 | .788 1.9 -0656 -9713 4.0 +000 13 -999 968 
0.9 -266 -816 2.0 -0540 -9773 4-5 -000 016 +999 996 6 
1.0 | 0.242 | 0.841 2.2 | 0.0355 | 0.9861 5.9 | 0.000001 5 | 0.999 999 71 


For negative values: ¢(—u) = (u); 6(—u) = 1 — B(u). 
B. Limit 

+D40-5. Let an infinite sequence of elements of any kind be given, 
e.g., E, Ez, E, etc. (so that for every positive integer m there is exactly 
one nth member Æ, in the sequence; but the same element may occur at 
different places, e.g., E, may be the same as E;). A subsequence of the 
given sequence is a final segment of it = pr it consists of all members of the 


given sequence from some one member on, in the original order (e.g., 
E;, Es, E, etc.). 


+D40-6. Let f be a function from natural numbers to real numbers; 
which is defined either for all natural numbers or at least for an infinite 
subsequence of them, say m,, n, ns etc. (Hence, for any natural num- 
ber n as argument, for which f is defined, its value f(x) is a real number, 
and the values f(0), f(z), f(2), etc., or f(n:), f(n), f(m), etc., form an in- 

finite sequence of real numbers, which we call the S-sequence.) 
a. The real number r is the limit of the function Í for increasing n, or, 
the limit of the f-sequence (in symbols: ‘r = lim J(n)’) = ns for every 


positive real number g (however small it may be chosen), there is a 
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final segment of the f-sequence which lies entirely within the inter- 
valr £q. 

b. The /-sequence (or the function f) is convergent (otherwise, divergent) 
= ps there is a real number which is the limit of this sequence. 

In the following theorems Tr and Ta, ‘lim (..)’ is short for ‘ lim (..)’. 


n> 
740-21. Let f, and fı be convergent functions of the kind described 
in D6. 


a. lim (f:() +f2(n)) = lim f(s) + lim f,(n). 

b. lim (f:(m) — fa(n)) = lim f,(n) — lim f,(n). 

c. lim (f:() X fa(m)) = lim f,(m) X lim f,(n). 

d. If all members of the f,-sequence or of a final segment of it are equal 
to r, then lim fi(n) = r. 

e. If every member of the f;-sequence or of a final segment of it is equal 


to the corresponding member of the f,-sequence or of a final segment 
of it, then lim f,(m) = lim f,(n). 

f. If for every m ina final segment of the sequence of natural numbers 
film) < f,(m), then lim f,(m) < lim f,(7). 

740-22. Let r and m be constants, i.e., have the same value for all 
members of the sequence n = 1, 2, etc. Let r be an arbitrary positive real 
number, and m an arbitrary positive integer. 

a. lim (r/n) = o. 

b. lim (r/n”) = o. 


C. Infinite Cardinal Numbers 


D40-8. Recursive definition for ‘an. (We shall use only ‘a.’, ‘a,’, 
and ‘a,’.) 

a. as =p, the cardinal number of the class of natural numbers. A class 

or property with the cardinal number ao is called denumerable. 

b. a,,, =ps 2”. 

[Assuming the general continuum hypothesis, from which it follows 
that 2°" is the cardinal number next higher than an, our alpha-numbers 
ao, an, etc., are the smallest infinite cardinal numbers in order of mag- 
nitude, and hence are the same as Cantor’s aleph-numbers. We prefer 
the letter alpha for typographical convenience.] 

740-25. ; 

a. A class or property is denumerable, i.e., it has the cardinal number 

ao, if and only if there is a one-one correlation between its elements 
and the natural numbers, in other words, if its elements can be or- 
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dered in an infinite sequence (in the sense of D 5) without repetitions. 

b. Each of the following classes has the cardinal number a;,: (1) the 
class of all classes of natural numbers; (2) the class of all real num- 
bers; (3) the class of the real numbers in any interval (of positive 
length) ; (4) the class of all points on a straight line. (For this reason, 
a, is sometimes called the cardinal number of the continuum.) 

c. Each of the following classes has the cardinal number az: (1) the 
class of all classes of real numbers; (2) the class of all functions of 
real numbers. 


T40-26. Let u be an infinite cardinal number, and n a positive finite 
cardinal number (a natural number > o). 
a ut+tn=pu-n= q. 


b. nu = p. 
CCE E = ay, 
a We =I. 
e p= p. 


D. Combinatorics 


T40-29. Permutations. i 
+a. The number of permutations of n elements (i.e., ways of ordering the 
elements in-a linear order, in other words, finite sequences without 
repetitions) is 7!. 
Explanation. There are n possibilities of choosing an element as the first, then 
n — 1 possibilities of choosing one of the n — 1 remaining elements as the sec- 
ond, then  — 2 possibilities for the third, etc.; hence altogether n(n — 1) 
(n—2)...= nh. 
b. For a given class of n elements, the number of one-one correlations 


having the given class both as domain and as converse domain is n! 
(From (a).) 


The following theorems T3r, T32, and T33 refer to the possible ways 
for distributing v (or m) elements among u (or m) mutually exclusive 
properties. Let us call (here only) two such distributions of elements sta- 
tistically equal, if the one distribution assigns to each of the properties 
the same number of elements as the other distribution; otherwise, sta- 
tistically different. If the number of the elements in question is a finite 
number z, and if these elements are individuals in one of our systems £ and 
hence are designated by in in £, and if the m properties are designated 
by the molecular predicates of a division in £ (D2 5-4), then any distribu- 
tion of the » individuals can be described by a sentence in £ of the kind 


* 
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which we have called (D26-6a) an individually specified description of a 
distribution or, for short, an individual distribution for the x in. Statisti- 
cally equal distributions of individuals are described by isomorphic in- 
dividual distributions (meaning here sentences) for the in. And the com- 
mon statistical features of statistically equal distributions of individuals 
are described by a sentence which we have called (D26-6c) a statistical 
description of a distribution or, for short, a statistical distribution for the 
in. Now the essential point in the earlier definitions mentioned (D26-6a 
and c) is that they have been framed in such a way that for every distri- 
bution of the individuals there is exactly one sentence called an individual 
distribution for the in. That is the reason why T3rc states the same num- 
ber as T31a, and T3r1i the same as T31d; and, further, T32b the same as 
T32a, and T32e the same as T32¢. Similarly, every one of the statistically 
different kinds of statistically equal distributions of individuals is de- 
scribed by exactly one sentence called a statistical distribution for the in. 
Therefore, T33b states the same number as T33a. 


740-31. Individual distributions. Let v and u be any finite or infinite 
cardinal numbers, and » and m be finite. 
+a. The number of possible distributions of » elements among y prop- 
erties is p”. 
Explanation. There are possibilities of placing the first element, likewise 
u for the second, etc.; hence altogether y X u X... = W. . 


All the following items are simple corollaries of (a). 

b. The number of functions which have one of u values for each of v 
arguments is u”. (From (a).) 

c. The number of individual distributions (sentences, in the sense of 
D26-6a) in £ for n given in with respect to a given division of m predi- 
cates is m’. (From (a).) 

d. The number of possible distributions of v elements among two classes 
is 2”. 

e. The number of functions which have one of two values for each of 
v arguments is 2”. (From (b).) 

f. The number of lines in a table with n arguments and two values 
(e.g., the truth-values in a truth-table (§ 21B), or the values + 
and — in the table of Q-predicates, A31-1) is 2". (From (e).) 

g. The number of selections which contain exactly one element from 
each of v mutually exclusive pairs of elements is 2”. (From (d).) 

h. The number of subclasses of a given class K with v elements (in- 
cluding the null class and K itself) is 2”. (From (d).) 
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i. The number of individual distributions (sentences) in £ for n given 
in with respect to the division of ‘M’ and ‘~M’ is 2”, (From (c).) 
k. The number of possible distributions of 2 elements among m prop- 
erties (n = m) such that none of the properties is empty is as fol- 
lows if m = 2 (for m = 1, the number is 1): s 


(1) exactly © [(—1)'[(m — 8" — (m — DIO]; 


(2) approximatively, if n is not very small: > [(—1)'(m — ky); 
(3) rougher approximation for the case that n is large in relation 
to m: 
mii m3]. - 

1. The number of possible distributions of n elements among m prop- 
erties such that p specified properties are empty and the others are 
not (n = m — p) is as follows if m — p= 2 (for m — p = 1, the 
number is 1): 

(x) exactly 27 [(—1)'[(m — p — k)" — (m — p— 8)" 7^]; 
(2) approximatively, if n is not very small: 
È -n'm p= CT’); 
(3) rougher approximation for the case that n is large in rela- 
tion to m: 
(m — p)"[x — (m — D(F)". (From (k).) 

m. (1), (2), (3). The number of possible distributions of elements 
among m properties such that exactly p properties (no matter 
which ones) are empty (n > m — $) is equal to the number specified 
under (1), form (z) or (2) or (3), respectively, multiplied by (3). 
(From (1); cf. T32d.) 


140-32. Individual distributions with given numbers. 
+a. The number of those (statistically equal) distributions of n individu- 
als among m properties which assign %, individuals to the first prop- 
erty, n, to the second, . . . and m, to the mth (wheren,-+n,+... 
+ mn =n) is aie cel 3 


Explanation. If the n individuals are ordered in a sequence, we may decide 
to assign the first ; individuals to the first property, the next m, to the second, 
etc., and the last mm to the mth. In this way every ordering of the individuals 
determines uniquely a distribution with the required numbers m, 12, etc. How- 
ever, many different orderings determine the same distribution. If a certain 
order of the » individuals is given, we obtain another order determining the 
same distribution by rearranging the first m individuals (the number of pos- 
sible arrangements for them is m!, T29a); then the next m, individuals, etc.; 
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finally the last 2m individuals, Thus each distribution with the given numbers 
is determined by m;!s2! .. . tm! distinct ways of ordering the n individuals. 
There are altogether ! such ways (T29a). Therefore the number of distribu- 
tions with the given numbers is m!/(m,!n2! . . . mm!). 


b. The number of (mutually isomorphic) individual distributions (sen- 
tences) in £ for # given in with respect to a given division of m predi- 
cates ‘Mr, ‘M,’,..., ‘Mn’, with the cardinal numbers %:, na, ... 
nm, iS amet (From (a).) 

c. The number of (statistically equal) distributions of » individuals 
among two properties with the cardinal numbers n, and n, is 

aint = (a) = (h). (From (a).) 

d. The number of subclasses with #, elements of a given class with n 
elements (often called combinations of n elements taken n, at a time) 
is (#). (From (c).) 

e. The number of (mutually isomorphic) individual distributions (sen- 
tences) in £ for n given in with respect to the division of ‘M’ and 
‘~M’ with the cardinal numbers n, and n is (p). (From (c).) 

f. Let the class K have the infinite cardinal number v, and let » be a 
positive finite number. The number of subclasses of K with n ele- 
ments is v. 

g. Let v be an infinite cardinal number. The number of (statistically 
equal) distributions of » individuals among two classes such that 
each class contains v individuals is 2”. 

h. The number of ways of choosing , elements from a given class of n 
elements and ordering them (sometimes called permutations of n ele- 
ments taken m, at a time) is (%)m! = an = [n]. (From (d), 
T29a.) 

i, Let m predicates ‘M,’,..., ‘Mm form a division. Let K be a class 
of n individuals, of which #, have the property M., na Ma, . 
nim Mm. Then the number of those subclasses of K which contain s 
individuals, and among them s; (i = 1, . . . , m) with the property 
M,, is (C) . - - C). From (d).) 


T40-33. Statistical distributions. 
a. The number of statistically different kinds of distributions of n in- 
dividuals among m properties is edaphic) ie Wa 
(n+ m — 1)!/n\(m — 1)!. 
Explanation. The distributions in question may be represented by serial pat- 
terns consisting of n dots and m — 1 strokes as follows: the number of individu- 


als which have the first property is indicated by the number of dots preceding 
the first stroke; that of the second property by the dots between the first and 
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second stroke, etc.; finally, that of the mth property by the dots following the 
last stroke. (For example, the pattern‘... / / . /’ indicates the numbers 3, 0, 
1, ofor the four properties.) Therefore the number sought is equal to the number 
of possible patterns with n dots and m — 1 strokes. These patterns may be pro- 
duced by starting with a series of n + m — 1 dots and then replacing a subclass 


of m — 1 of them by strokes. The number of these subclasses is mar 
(T32d); therefore, this is also the number of possible patterns. 


The number of statistical distributions (sentences, in the sense of 
D26-6c) in £ for » given in with respect to a given division with m 
predicates is ("$77"). (From (a).) 

The number of statistically different distributions of individuals 
among m properties such that none of the properties is empty is 
n~-? 

mm tie 


Explanation. We first assign to each property one individual. Then the dis- 
tributions described are made by distributing the remaining n — m individuals 
among the m properties. There are ("7m +m-1) ways of doing this (a). 


The number of statistically different distributions of » individuals 
among m properties such that p specified properties are empty and 
the others are not is (,,"}2 ,). (From (c).) 

The number of statistically different distributions of n individuals 
among m properties such that exactly p properties (no matter 
which ones) are empty is (J)(,,"}~,). (From (d), T32d.) 


CHAPTER IV 
THE PROBLEM OF INDUCTIVE LOGIC 


This chapter contains some general, preliminary discussions concerning the 
nature of inductive logic and the problems of its possibility and use. These dis- 
cussions are intended to remove some obstacles and prepare the way for the 
construction of a system of inductive logic, which we shall begin in the next 
chapter. 

Inductive logic is here conceived as the theory of an explicatum for proba- 
bility,. The logical concept of probability; as explicandum is explained by in- 
terpreting it not only as evidential support but also as a fair betting quotient 
and as an estimate of relative frequency (§ 41). In this connection the problem 
of the presuppositions of the inductive method is discussed (§ 41F). The anal- 
ogy between probability, and probability, (relative frequency) is discussed, 
and the change in the meaning of the word ‘probability’, which originally had 
only the sense of probability, and later acquired the second sense of probability, 
is explained (§ 42). Many philosophers have doubts whether inductive logic, and 
especially quantitative inductive logic, is possible, and some even assert its 
impossibility. Various reasons given for these beliefs are here discussed. They 
are often based on misconceptions of the nature and task of inductive logic. An 
attempt to clarify this nature is made by pointing out the close analogy between 
inductive and deductive logic and the lack of effective procedures for solving 
the chief problems in both these branches of logic (§ 43). A distinction is made 
between logical and methodological problems both for deduction and for induc- 
tion; inductive logic has only the task of solving the logical problems. The prin- 
cipal kinds of inductive inference are explained (§ 44). Against those whose 
opposition to inductive logic is based on their general suspicions against ab- 
stractions, the usefulness and even indispensability of abstractions is empha- 
sized, and it is shown that inductive logic, although based upon a simplified 
schema, is nevertheless applicable to problems in the actual world (§ 45). It 
must be admitted that the scientist’s choice of a suitable hypothesis for the ex- 
planation of observed events is determined by factors of many different kinds. 
However, inductive logic has the task of representing the logical factors only, 
not those of a methodological or practical nature. The assertion that even the 
logical factors are in principle inaccessible to measurement can hardly be main- 
tained (§ 46). On the other hand, even if we succeed in assigning numerical 
values to the logical factors, the task of determining how they should influence 
the degree of confirmation ¢ involves great difficulties. Therefore the doubts 
whether it is possible to solve the task, to give an adequate definition of c, seem 
understandable; however, the attempts so far made to prove the impossibility 
fail short of their aim (§ 47). Incidentally, the question is discussed how the 
concept of probability; is used in practical life and in science; it seems that it is 
used in a quantitative way within a much wider domain than the skeptics real- 
ize. This psychological fact concerning the use of the explicandum does not, of 
course, solve the logical problem of the possibility of a quantitative explicatum; 
nevertheless, it may encourage us to look for such an explicatum (§ 48). Assum- 
ing that quantitative inductive logic is possible, could it be usefully applied? 
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Its application has some essential limitations and involves certain difficulties 
which are similar to but still greater than those connected with deductive logic. 
On the other hand, inductive logic can be of great help within the theoretical 
domain of science, especially in cases where statistical descriptions and infer- 
ences are involved. Its development will also help to clarify the foundations of 
induction and thereby of the whole scientific method. Furthermore, inductive 
logic can and must be applied in order to serve, on the basis of our experiences, 
as a “guide of life” (§ 49). The problem of how a rule’can be laid down for the 
determination of practical decisions with the help of inductive logic is discussed 
in detail. The inductive concept of an estimate plays an important part in a 
rule of this kind (§§ 50, 51). 

Tn the last part of this chapter some more technical questions concerning ¢ 
are discussed. It is explained why we take as arguments of ¢ sentences rather 
than propositions or events, as is customary (§ 52). Some conventions are laid 
down which state certain fundamental, generally accepted properties of c (§ 53). 
With the help of these conventions, it is shown how our problem of defining an 
adequate function ¢ for all language systems £ can be reduced to the problem of 
assigning suitable numbers to the state-descriptions (8) in the finite systems fw. 
Further, some additional requirements for ¢ are laid down (§ 54). The results 
of these informal considerations are meant merely as signposts to guide our 
steps when, in the next chapter, we shall begin the systematic construction of a 
quantitative inductive logic. 


The Logical Concept of Probability 


Some further explanations are given concerning the meaning of probability, 
as an explicandum. A. In our original explanation, probability; was taken as a 
measure of evidential support. B. The value of probability, for a hypothesis 4 
may be interpreted as a fair betting quotient for a bet on 4. C. Let h be the pre- 
diction that the individual b has the property M; let b belong to the class K; 
let the relative frequency of M in K be r. If ris known, then r is the fair betting 
quotient for a bet on 4. D. If r is not known, then the estimate 7’ of r is the 
fair betting quotient. Since the probability, of & was interpreted as the fair 
betting quotient, we may in the Present case interpret the probability, of 4 as 
the estimate of the relative frequency of M in K. In a more general way the 
numerical value of probability; may be interpreted as the estimate of the rela- 
tive frequency of truth among given equiprobable hypotheses, The logical rela- 
tion between probability, and the general concept of the estimate of a magni- 
tude (as explicanda) is explained; this relation will later be utilized for the 
definition of an explicatum for the concept of estimate (§ rooA). Since proba- 
bility. means the relative frequency in the long run, the probability, of a singu- 
lar prediction concerning M may be interpreted as the estimate of the proba- 
bility, of M. This close relation between the two concepts of probability is the 
reason for a far-reaching analogy between certain theorems concerning these 
concepts. This relation also gives a psychological explanation for the fact that 
many authors since the classical period seem sometimes to shift inadvertently 
from probability; to probability. This is presumably the case when the authors 
infer a frequency from a probability or speak about unknown probabilities or 
the chance of a certain probability. Æ. Our conception is in agreement with 
Reichenbach’s analysis of his two explicanda, the frequency concept of proba- 
bility and the logical concept of Probability or weight. But it is not in agree- 
ment with Reichenbach’s explication of the latter concept, because he identifies 
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this concept (like the former) with relative frequency instead of the estimate of 
relative frequency. F. What is needed as a presupposition for the validity of the 
inductive method and the justification of its application in determining practical 
decisions is not the principle of the uniformity of the world but only the state- 
ment that the uniformity is probable on the basis of the available evidence. This 
statement is an analytic statement in inductive logic and hence not in need of 
empirical confirmation. Thus the apparent vicious circle, which many philoso- 
phers believe to be involved in the validation of the inductive method, dis- 
appears. 


We have previously (in chap. ii) distinguished two meanings of the word 
‘probability’: the first (‘probability,’) means weight of evidence or 
strength of confirmation, the second (‘probability.’) means relative fre- 
quency. The chief topic of this book is the problem of an explication of 
probability,. As explained earlier (§ 8), this problem may be approached 
on three different levels; we may try to define an explicatum for proba- 
bility, in any one of the following three forms: 

(i) a classificatory concept of confirmation (‘the hypothesis % is con- 
firmed by the evidence e’); 
(ii) a comparative concept of confirmation (‘h is confirmed by e at least 
as highly as h’ by e”); 
(iii) a quantitative concept of confirmation, the concept of degree of con- 
Sirmation (‘h is confirmed by e to the degree 7’). 

If a satisfactory explicatum of the kind (iii) could be found, it would ob- 
viously be the most desirable solution of our problem. A theory of the con- 
cept of degree of confirmation, founded upon an explicit definition of this 
concept, would constitute a quantitative inductive logic. If a satisfactory 
quantitative explicatum is not found or—as some authors believe—can 
never be found, then we should have the more modest task of defining a 
comparative explicatum. This would lead to a comparative inductive logic. 

This chapter will contain preliminary discussions clearing the ground 
for the later construction of a quantitative inductive logic. The nature and 
meaning of probability, as an explicandum will be clarified. Some circum- 
stances will be examined which seem to make the task of a quantitative 
explication of probability; difficult or, in the opinion of some philosophers, 
even insoluble. The possibility of applying inductive logic for the determi- 
nation of practical decisions will be examined. And, finally, some steps will 
be outlined for the construction of inductive logic. In later chapters, sys- 
tems of inductive logic both in a quantitative and in a comparative form 
will be developed. 

For any quantitative explicatum for probability,—not only for the one 
we shall define later—we use the term ‘degree of confirmation’ or often 
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briefly ‘confirmation’, when the context makes sufficiently clear that the 
degree of confirmation is meant and not the act of confirming; as symbol, 
likewise in the metalanguage, we use ‘c’. Thus, ‘c(h,e) = r’ is merely a 
shorter formulation for ‘the degree of confirmation (or: the confirmation) 
of h on the evidence e is r’; ‘c’ is often also used within a word sentence as 
abbreviation for ‘(degree of) confirmation’. ` 

In the present section we shall explain in greater detail the nature and 
meaning of probability,, the logical concept of probability. These explana- 
‘tions are not yet meant as an explication but merely as a clarification of 
the explicandum. Such a clarification is a necessary preparation for the 
later task of explication. In order to judge whether a proposed concept is 
adequate as an explicatum for a given explicandum, we must be sufficient- 
ly clear as to what we mean by the explicandum. 

The concept of probability, will be explained in this section from three 
different points of view. The probability, of a hypothesis 4 with respect to 
given evidence e represents 

(A) a measure of the evidential support given to h by e; 

(B) a fair betting quotient; 

(C) an estimate of relative frequency. 


A. Probability, as a Measure of Evidential Support 


The first aspect of probability, is the one explained earlier (§§ 8-10). 
To say that the probability, of + on e is high means that e gives strong 
support to the assumption of k, that k is highly confirmed by e, or, in 
terms of application to a knowledge situation: if an observer X knows e, 
say, on the basis of direct observations, and nothing else, then he has good 
reasons for expecting the unknown facts described by A. 

Although this explanation may be said to outline the primary and sim- 
plest meaning of probability, it alone is hardly sufficient for the clarifica- 
tion of probability, as a quantitative concept. For a comparative use, es- 
pecially in the simpler cases involving three instead of four arguments 
(§ 8, examples (b) and (c)), the explanation seems fairly clear. Scientists 
use and understand statements to the effect that one assumption h, is 
more highly confirmed by given observations e than another one hz. But 
it is not immediately clear what it should mean to say that h, is twice as 
much confirmed by e than h,; and still less clear what it might mean to 
say that the strength of support given to + by e is 3/4 or even that it is 5. 
(Why should this not be a possible value?) 

One might perhaps say that under certain plausible assumptions the 
meanings of numerical values for the strength of support become clear. 
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Let us assume (i) that this strength is to be measured by nonnegative 
numbers <1, and (ii) that, if two hypotheses k, and h, are L-exclusive, 
then the support given by e to , V h is to be measured by the sum of the 
numbers which measure the support given by e to k, and to h, separately. 
If now we know the meaning of the comparative concepts of stronger sup- 
port and of equal support, we may obtain an interpretation for numerical 
values of the strength of support as follows. Suppose % and ~ are equally 
supported by e. Since 4 V ~k is L-true, no sentence can be more certain 
on any evidence. Therefore the strength of support for h V ~h on e must 
have the highest possible value, which is 1, according to (i). According to 
(ii), this is the sum of the values for 4 and for ~h separately. Since these 
two values are equal, each is 1/2. Similarly, if we have n hypotheses which 
are such that necessarily one and only one of them must hold (in technical 
terms, they are L-disjunct and L-exclusive in pairs, D2o-1e and g) and 
which are equally supported by e, then the strength of support by e for 
each of them is 1/m, and that for a disjunction jm of m of them is m/z. To 
say of any other hypothesis %’ that it is supported by e to the degree m/n 
means that h’ and jm are equally supported by e. In this way we obtain an 
interpretation for rational numbers of the interval (o, 1) as values of the 
strength of support in certain cases and thus of probability, as a quantita- 
tive concept. 

I think the reasoning just outlined is correct once the assumptions (i) 
and (ii) are accepted. However, with respect to the concept of strength of 
evidential support, these two assumptions are entirely arbitrary. True, it 
is customary to make these assumptions in theories of probability,, and 
we shall make them too in our system to be constructed later. But in order 
to show that these assumptions express essential features of probability,, 
we have to go beyond an explanation of this concept as strength of evi- 
dential support. This will be seen by the following discussions of the sec- 
ond and the third aspects of probability,. 


B. Probability, as a Fair Betting Quotient 


Since the classical period of the theory of probability, games of chance 
and bets have very frequently served as convenient examples of applica- 
tion and, moreover, have often been used for the purpose of explaining 
the very meaning of the concept of probability in the sense of probability,. 
Among contemporary authors, Borel and Reichenbach especially have - 
made extensive use of betting situations in the clarification of probability. 

A bet, in the widest sense, may be regarded as a contract between two 
partners X, and X, to the effect that X, promises to confer a certain bene- 
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fit upon X, if a certain prediction + is fulfilled, and X, promises a benefit 
to X, in the case of ~h. We assume that the benefits promised in any bet 
by X, and X, are amounts of money, #, and u+, called the stakes. u, and u, 
are nonnegative; in general, both are positive; but we admit also the two 
extreme cases that either v, or 1, is o, but not both; hence u, + x, is always 
positive. We look at the result from the point of view of X,: in the favor- 
able case, that is, if # is true, he wins the amount uz; if his false, he loses 
U: Or, as we shall say for the sake of a more uniform terminology, he wins 
—u;. We call the ratio u: : u, the betting ratio (usually called the odds) 
and u;/(u; + u) the betting quotient. If the betting quotient q is given, 
the betting ratio is obviously q : (1 — q); only this ratio, not the amounts 
u, and u, themselves, is determined by q. We assume that X rand X, pool 
their knowledge before they make a bet concerning h; let e express their 
common body of information. The statement 

‘The probability, of with respect to the evidence e has the value g 
can now be interpreted as saying that a bet on % with a betting quotient q 
for the two bettors whose knowledge is e is a Jair bet. A bet is fair or equi- 
table if it does not favor either partner. Therefore the probability state- 
ment means that if a person is permitted to choose either the side of X, 
(i.e., betting on 4 with q) or the side of X, (i.e., betting on ~h with 1 — q), 
one choice is as good as the other. It follows that if a person is offered a 
cheaper bet on 4, i.e., with a betting quotient less than the probability, 
value q, it is advisable for him to accept it (with a certain qualification to 
be explained later); if he is offered a higher bet, it is advisable to reject it. 

The interpretation of probability, as a fair betting quotient is in accord 
with its first interpretation as evidential support, because the stronger the 
Support given to h by e is, the higher can a bet on h be. But this second 
interpretation is more specific than the first because it leads to numerical 
values. The question as to how a value of probability, as a fair betting 
quotient is to be determined has not yet been answered; we shall soon come 
back to it. Nevertheless, we shall see now that the second interpretation 
leads immediately to two simple results concerning the values. 

The stakes v: and u, may be any nonnegative numbers. Since q = 
Ur/ (tx + a), o Sq St (if m= o0 and m >o, q= 0; if m >o and 
ua = 0, q = 1). Thus the interpretation of probability, as a fair betting 
quotient leads to this result: 

(1) The values of probability, belong to the interval (o, 1), both end 

points included. 
This justifies the assumption (i) mentioned above under (A). 
If X, bets against X, on 4 with g = u,/(w: + u), then this is for Xia 
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bet on ~h with the betting quotient ~,/(w: + u:) = 1 — q. A bet is fair 
if it favors neither partner; therefore a fair bet is fair for both partners. 
It follows that, if g is a fair betting quotient for # on e, then 1 — q is a 
fair betting quotient for ~h on e. Since the probability, of k on e is meant 
as a fair betting quotient for h on e, the following holds: 

(2) If the probability, of k on e is g, the probability, of ~k on e is 

rq 

C. Probability, and Relative Frequency 


We have said that probability, may be regarded as determining a fair 
betting quotient. But the latter concept is itself in need of further clarifica- 
tion. We shall now try to throw some light on it, at least for the most im- 
portant kind of betting situation, namely, the case where the hypothesis 4 
is a singular sentence saying that a particular individual, say, b, has a 
certain property, say, M. 

In order to judge the fairness or bias of a bet between X, and X, con- 
cerning h, we regard it as an element of a whole set of n similar bets con- 
cerning the n individuals of a class K, one of which is b. e is supposed to be 
such that it does not say for any individual in K whether or not it has the 
property M or any other factual property. We consider the case where X, 
makes m simultaneous bets with X,; for every individual x in K, X, bets 
u, against “2, hence with the betting quotient g = u,/(w, + u.), that x 
has the property M. Suppose that actually rn of the n individuals in K 
are M, whether the two bettors know it or not; hence the relative fre- 
quency of M in K is z. What will be the final result after all individuals of 
K have been observed and all debts paid? X, wins rn bets; thus he re- 
ceives the amount of rmu,. He loses (1 — r)n bets; thus he has to pay 
(t — r)nu; Therefore, his total balance is rnu,— (1 — 1r)nu, = 
n(ux + u2)(r — q). Since u: + u is always positive, X, will come out 
with a gain if g < r; with a loss if g > r; and just even if q = r. 

Let us assume that X, is a rational bettor who is not willing to pay a 
price merely for the fun of the excitement, as a player in a commercial lot- 
tery does. He makes a bet only if it is not unfavorable in view of his 
chance, and he determines his chance in each case with the help of rational 
inductive methods on the basis of the evidence e available to him. X, is 
likewise supposed to be a rational bettor. How will a bet between the 
two then be made? 

We consider first the case that the common knowledge e contains the 
information that exactly rm of the n elements of K are M, although it is 
not known which elements are M. It is clear that in this case the two bet- 
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tors will not make the set of m bets concerning K with any betting quo- 
tient g different from r. For if q > r, the bet is unfavorable for X,. He will 

not make the set of those bets in this case because, as we have seen, that 
would with certainty lead to a loss in the final balance. If he makes only 
a part of the bets or perhaps only one, then the over-all loss is not certain, 
a gain is possible. Nevertheless, X,, being a rational bettor, will not even 
conclude one bet with any g greater than the known r because it is un- 
favorable for him in this sense: it is one case out of a whole class of logically 
similar cases for which the mean result is a loss. Similarly, X, will not 
make a bet with g < r. Thus the only possibility for a bet is one on the 
basis of g = r. There is no point for X, and X, in making the totality of n 
bets with this quotient because the end result is foreseeable with certainty: 
neither will gain or lose anything. But they might conclude one bet or 
a proper part of all the bets with g = r. In this case the result is uncertain, 
as it ought to be in a genuine bet; and the betting quotient is fair, that is, 
not clearly favorable to either side. 

Thus we have obtained the following result. If the relative frequency 
of M in a class to which b belongs is known to be r, then the fair betting 
quotient for the hypothesis that b is M, and hence the probability, of 
this hypothesis, is 7. 


D. Probability, as an Estimate of Relative Frequency 


Now we shall consider the more frequent and more interesting case that 
the two bettors have no knowledge about the relative frequency r of M 
in K. X, knows that the final balance for the total class of bets depends 
upon this value 7. If he knew this value, he would regard it as a fair bet- 
ting quotient, as we have seen. Since he does not know the value, he will 
try, if possible, to make an estimate of it on the basis of his knowledge e 
of observations of other things and regard this estimate as a fair betting 
quotient. Since the probability, of 4 on e is intended to represent a fair 
betting quotient, it will not seem implausible to require that the prob- 
ability, of h on e determine an estimate of the relative Srequency of M in K. 
Thus we shall try to interpret the statement ‘The probability, of the as- 
sumption that bis M with respect to the evidence e not mentioning b isq’ as 
saying that the estimate with respect to e of the relative frequency of M 
in a class K of individuals not mentioned in ¢ is g. However, before we can 
accept this interpretation, a closer examination will be necessary. In par- 
ticular, we shall have to clarify the concept of an estimate, and then we 
must show that the interpretation of probability, just given is in accord 
with the interpretations given earlier. 
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To find an estimate w’ of the unknown value u of a magnitude on the 
basis of given evidence e is an inductive procedure, not a deductive one, 
because there is no certainty that the estimate w’ is equal or even near to 
the actual value w. The concept of estimate is indeed one of the most im- 
portant concepts of inductive logic; it will be discussed in detail later 
(chap. ix). At the present moment it may be sufficient to indicate briefly 
the connection between the general concept of the estimate of a magnitude 
and probability,. Suppose that it is known, either by the definition of the 
magnitude in question or by the information e, that there are possible 
values of the magnitude, say, tr, u2,.-., un. Then we may take as the 
estimate of with respect to e the weighted mean of these possible values 
with probability, as weight. This we call the probability,-weighted mean 
or, briefly, the probability,-mean. (The probability,-mean is, in the ter- 
minology of the classical theory of probability, the expectation value of 
the magnitude.) Hence we define as follows: 


(3) The estimate (more explicitly, the probability,-mean estimate) of 
the unknown value of a magnitude with respect to given evidence 
e =p; the probability,-mean, that is, the sum of the products 
formed by multiplying each of the possible values of the magnitude 
with the probability, of its occurrence with respect to e. 


Throughout this chapter we shall understand the term ‘estimate’ always 
in the sense defined by (3). Note that (3) gives merely a clarification of the 
term ‘estimate’ as an explicandum, not yet an explication, because the 
term ‘probability,’ is so far not explicated. (Later we shall explicate prob- 
ability, by the degree of confirmation c (chap. v) and hence the prob- 
ability,-mean by the c-mean as estimate-function (chap. ix).) As an ex- 
ample, suppose that the possible gain for X; in a game or business ven- 
ture is known to be either g, or ga. The actual gain g is unknown. We as- 
sume that X, is able to determine the value of probability, for any 
hypothesis with respect to any possible evidence and, in particular, with 
respect to the evidence actually available to him. If, with respect to the 
available knowledge e, the two possible outcomes have equal probability, 
the estimate g’ of the gain is g,/2 + g./2 = (g: + g.)/2, hence the arith- 
metic mean. If, however, the probability, of g: is 3/4 and hence that of g, 
1/4, then g’ is 3g,/4 + 82/4. g’ represents for X, the money value of his 
share in the game or business. As a rational man he is not willing to buy 
this share for more than g’ nor to sell it for less. 

Let us now apply the concept of estimate as defined by (3) to the set of 
n bets described under (C). We found that if X, makes these bets with the 
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betting quotient q (= u:/(u: + u,)) and the relative frequency of M in 
K isr, then his total gain g (positive or negative) will be n(u: + u-)(r — q). 
There are n + 1 possible values of the number m of individuals in K 
which are M (o, 1, 2, . . . , n) and hence n + 1 possible values of r = m/n 
and of g = n(u: + u)(r — q). Each of these n + x possible cases has a 
certain probability, with respect to e. Thus X, can determine, according 
to (3), the estimate m’ of m, the estimate r’ of r, and the estimate g’ of g 
with respect to e. It can easily be shown on the basis of the definition (3) 
that, no matter what the particular probability, values are, the following 
equations hold: 
(4) r’ = m'/n; 
(5) g = m(us + u) — 9). 
[The reason is that 7 is a linear function of m, and g is a linear function of 
r; cf. T1oo-5 on the basis of Droo-1.] Consequently, X, will reject any 
offered bet with a betting quotient q > r’ because the estimate g’ of his 
gain would be negative; he may accept a bet with g < r’. Thus the situa- 
tion here is similar to that discussed earlier in which r was known, but it 
is not quite the same. In the former situation X, knew that the total set 
of bets with g = r will leave him without gain or loss, and the set with 
q < r will result in a final gain. In the present situation, however, the re- 
sult of the total set of bets with g = r’ cannot be foreseen; the estimate 7’ 
of the relative frequency may be greater than its actual value 7, and in 
this case the total result will be a loss. But there is also the possibility 
of a gain. Thus, in the present situation, not only the outcome of a single 
bet or a few bets is uncertain, but even that of the total set of bets. Since, 
however, uncertainty is of the essence of a bet, this fact alone will not 
deter X, from betting, provided the conditions of the bet are not un- 
favorable to him. They are unfavorable to him if g > r’, and unfavorable 
to X, if g <r’; they are neither favorable nor unfavorable to either side 
only if q = r’. The bet is fair if and only if the estimate of the gain is zero 
for both partners; and this is the case if and only if the betting quotient 
q for h is equal to r’: $ 
(6) For a bet on the singular prediction that an individual belonging 
to a class K of unknown individuals has the property M on the 
basis of the available knowledge e, the fair betting quotient is the 
estimate of the relative frequency of M in K on the basis of e.. 


Here, however, two difficulties seem to appear. Suppose that X, con- 
siders a bet with X, on the hypothesis that an unknown individual b is M 
on the basis of their common knowledge e. He asks what would be a fair 
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betting quotient for % on e. Let us assume that X, knows how to de- 
termine values of probability, and hence also, according to (3), estimates 
of relative frequency. Our first answer is given by (6): take a class K of n 
unknown individuals containing b and determine the estimate of the 
relative frequency of M in K; this is a fair betting quotient. Here the first 
difficulty arises: which number should X, choose, and which class of n 
individuals? What if the estimate has different values for different classes? 
Now it can be shown that the latter case is impossible, because the fol- 
lowing holds: 


(7) For any given evidence e and any given molecular property M, the 
estimate (probability,-mean) of the relative frequency of M in a 
non-empty class K has always the same value no matter how many 
and which individuals belong to K, provided only that e does not 
say anything about these individuals. 


We shall later prove an important theorem (T106-r1d) to the effect that 
the independence stated in (7) holds generally for a comprehensive class of 
functions (called symmetrical c-functions) containing among others all 
those functions which can be considered as adequate explicata of prob- 
ability, Thus X, will find one value as estimate of the relative frequency 
of M within any class K of unobserved individuals, no matter whether K 
is small or consists of the total unobserved part of the universe. 

The second difficulty seems to arise from the fact that we have given 
two different rules for the determination of a fair betting quotient for 
h on e: this quotient was equated in (6) to the estimate of the relative 
frequency but, earlier, to the probability, of h on e. Now it can be shown 
that these two values always coincide: 


(8) Let e be any (non-L-false) evidence, M any molecular property, 
b an individual and K any class of individuals not mentioned in e, 
and h the hypothesis to the effect that b is M, then the estimate 
(probability,-mean) of the relative frequency of M in K is equal 
to the probability, of / on e. 


This holds likewise generally for the class of functions mentioned above. 


It is easily seen that (8) follows from (7). Let r’ be the estimate of the relative 
frequency in K, and 7” that in fb}, the class consisting of b alone. Then, ac- 
cording to (7), r’ =r”. The relative frequency in {b} has only two possible 
values: 1 if k is true, o if ~h is true. Therefore, according to (3), 7” = 1 X 
probability: of # on e + o X probability: of ~h on e = probability: of 4 on e. 
Hence r’ = probability, of # on e. This is (8). 


The result (8) justifies the earlier interpretation mentioned tentatively: 
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the probability, of a singular hypothesis concerning M can be interpreted as 
the estimate of the relative frequency of M in an unknown class K. The re- 
sult (8) is, in fact, a special case of the following: 


(9) Let e be any (non-L-false) evidence and Ñ; be any non-null class 
of sentences each of which has the same probability,-value g with 
respect to e. Then the estimate of the relative frequency of true 
sentences in Ñ; is equal to q. ; 


The later theorems corresponding to (8) and (7) (T106-1c and d) will 
be derived from a much more general theorem (T104-2c) which corre- 
sponds to (9). While (8) concerns individuals and one given property M, 
in other words, hypotheses which are full sentences of the same predicate 
‘M’ differing only in the individual constants occurring, (9) refers to a class 
of sentences without restriction; these sentences may have any forms 
whatever, and there may be deductive relations between them (e.g., 
L-implication, L-exclusiveness, or even L-equivalence). The result (9) 
can be used to explain probability, as a quantitative concept in terms of 
the following two concepts: (1) probability, as a comparative concept 
and, in particular, the relation of one hypothesis h, being equally prob- 
able to another one /, with respect to the same evidence, and (2) the con- 
cept of estimation, in particular, the estimate of the frequency of truth 
with respect to a given evidence e. Suppose that X understands these two 
concepts as explicanda; that is to say, he knows roughly what he means by 
them, although he may not be able to explicate them, i.e., to give exact 
rules for their use. Then, with the help of (9), we can explain to him 
probability, as a quantitative explicandum in the following way: if you 
have a class of s hypotheses which have equal probability, on e, then take 
as numerical value for the probability, of each of them the estimate of the 
relative frequency of truth among them (in other words, the estimate of 
the number of true sentences in the given class, divided by s). Thus the 
common probability, value of several hypotheses can be interpreted as the 
estimate of the relative frequency of truth among them. 

In the foregoing discussions we have interpreted the concept of prob- 
ability, in terms of an estimate of relative frequency, either of a property 
M among given individuals or of truth among given sentences. These esti- 
mates are special cases of the general concept of estimate; and this con- 
cept again was explained in terms of probability,. In a system of defini- 
tions a circular procedure of this kind would, of course, be inadmissible. 
But our present discussions aim only at a clarification of certain concepts 
as explicanda. In such a clarification it is not only admissible but expedient 
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to go back and forth and in circles, illuminating the network of concepts 
by analyzing the logical relations holding between any two of them. In 
the later construction of a system containing explicata of those explicanda, 
a chain of definitions not involving any circle will be built up. First, the 
regular c-functions (confirmation-functions) will be defined (§ 5 5A); they 
comprehend possible explicata for probability, With their help, a general 
concept of an estimate-function (‘c-mean estimate’) will be introduced by 
a definition (Droo-1) which corresponds to (3) above. The relations be- 
tween the degree of confirmation and the estimate of relative frequency 
will then be stated by theorems (T104-2c and T106-1c) corresponding to 
(9) and (8). 

If we take a sufficiently large unknown class K, then the relative fre- 
quency of M in K may be regarded as representing the relative frequency 
“<n the long run”. But this is the explicandum of probability., the statisti- 
cal concept of probability. Thus we find an important connection between 
the two probability concepts: in certain cases probability, may be regarded 
as an estimate of probability. 

The relation between probability, and probability, is hence seen to be 
a special instance of the logical relation which holds generally between 
an empirical, e.g., physical, quantitative concept and the corresponding 
inductive-logical concept of its estimate with respect to given evidence. 
This relation explains, on the one hand, the different nature of the two 
probability concepts, but, on the other hand, also the far-reaching anal- 
ogy between them which we shall repeatedly observe in our further 
discussions. 

The interpretation of probability; as an estimate of relative frequency. 
for future observations may help us in clearing up a problem which has 
been much discussed since classical times. Consider the following three 
sentences: s 

(i) ‘The available knowledge e contains the information that this die 
has symmetrical shape, and hence in geometrical respects its six 
sides are alike. e does not contain any information concerning other 
respects in which the sides may differ.’ 

(ii) “The probability that any future throw of this die will yield an ace 

is 1/6.’ 
(iii) ‘If i sufficiently long series of throws of this die is made, the rela- 
tive frequency of aces will be 1/6. 


The problem is whether (iii) can be inferred from (ii). Earlier authors 
r have sometimes made inferences of this kind from probability to relative 
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frequency. They meant the term ‘probability’ in (ii) in the sense of proba- 
bility, with respect to the evidence e characterized in (i); this interpreta- 
tion is clear from their reference to symmetry. On the basis of this inter- 
pretation, however, no valid inference can lead from (ii) to (iii), because 
the statement (ii) is purely logical while (iii) is factual. Later authors 
have correctly criticized inferences of this kind. This was first done by 
Mises (1919), who later said concerning the invalid inference just de- 
scribed: “I still believe that unearthing the fallacy of the classical argu- 
ment is the cornerstone of what is called the frequency theory of prob- 
_ ability” [Comments 2]. 

On the other hand, let us modify the inference by taking either of the 

following two statements instead of (iii): 


(iv) ‘The estimate of the relative frequency of aces in any future series 
of throws of this die is 1/6.’ 

(v) ‘The probability, of the prediction that the relative frequency of 
aces in a future series of throws of this die will be within the small 
interval 1/6 + eis high (and can even be brought as near to 1 as 
wanted) if the series is made sufficiently long.’ 


(iv) does indeed follow from (ii), as is seen from our previous discus- 
sion. According to classical conceptions, also (v) follows from (ii) in virtue 
of Bernoulli’s theorem. (This theorem will be discussed later (§ 96); we 
shall see that it can be applied only under certain restricting conditions, 
which may make its use in the above example questionable; but we may 
leave this problem aside for our present discussion.) Now the inferences 
in question made by earlier authors are usually not formulated in very 
clear and unambiguous terms. The conclusion is seldom formulated in a 
way similar to (iii). Sometimes phrases are used like ‘we may anticipate’ 
or ‘it is to be expected’ or something similar. In these cases it might not 
be implausible to assume that what the author actually meant is not a 
factual assertion like (iii) but an inductive statement concerning either 
an estimate like (iv) or a high probability, like (v). If so, the author cannot 
be accused of committing the fallacy earlier explained. Those cases in 
which the fallacy of inferring (iii) is actually committed can now be ex- 
plained psychologically: they arise from a confusion of an estimate of fre- 
quency with the frequency itself, 

The difference between probability, and probability, may be further 
elucidated by analyzing the sense of the customary references to unknown 
probabilities. The value of a certain probability, may be unknown to us 
at a certain time in the sense that we do not possess sufficient factual 
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information for its calculation. On the other hand, the value of a proba- 
bility, for two given sentences cannot be unknown in the same sense. (It 
may, of course, be unknown in the sense that a certain logicomathematical 
procedure has not yet been accomplished, that is, in the same sense in 
which we say that the solution of a certain arithmetical problem is at 
present unknown to us.) As we have seen earlier (§ 12B), the classical 
authors on probability deal, on the whole, with probability,. However, 
they sometimes refer to unknown probabilities or to the probability (or 
chance) of certain probability values, e.g., in formulations of Bayes’ theo 
rem. This would not be admissible for probability, Perhaps the authors 
here inadvertently go over to probability,. Since a probability, value for 
a given case is a physical fact like a temperature, we may very well inquire 
into the probability, on a given evidence, of a certain probability, How- 
ever, a question about the probability, of a probability, statement has no 
more point than a question about the probability, of the statement that 
2 +2 = 4 or that 2 + 2 = 5, because a probability, statement is, like 
an arithmetical statement, either L-true or L-false; therefore its proba- 
bility,, with respect to any evidence, is either 1 or o. 


E. Some Comments on Other Conceptions 


On the basis of the preceding discussions it will now be possible to 
clarify the relation between our conception of probability, and Reichen- 
bach’s conception. Since Reichenbach is one of the leading representatives 
of the frequency conception, it might at first appear as if our views must 
be fundamentally opposed. However, a closer examination of Reichen- 
bach’s argumentation shows that the two points of view are actually 
quite close to each other. As long as Reichenbach discusses the two expli- 
canda of probability before he proposes his explicatum, our views are in 
agreement on all basic points. He explains that there are two forms of 
probability or two kinds of application ([Experience] § 32). The one is 
the frequency concept, our probability, The other is called by him the 
“logical concept of probability” or “weight”. When we see that he refers 
to it also as “predictional value” (op. cit., p. 315) and says that it is de- 
termined not only by the event in question but “also by the state of our 
knowledge”, it becomes clear that this explicandum is the same as, or 
something similar to, our probability.. Now it is interesting to see that 
Reichenbach’s analysis of this concept and its function in determining 
decisions, especially in the case of wagers, leads him to the procedure of 
estimation (“appraising”); thus he comes very close to our interpretation 
of probability,. He distinguishes between the actual value and an estimate 
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(“appraisal”) of a magnitude, e.g., the funds needed for a new factory 
or the spatial distance estimated by an artillery officer (p. 319). This 
analysis is then applied to the case of a wager. “The man who bets on the 
outcome of a boxing match, or a horse race, or a scientific investigation 
. .. makes use of such instinctive appraisals of the weight; the height of 
his stakes indicates the weight appraised.” From his preceding discussions 
it is clear that the magnitude to be estimated in these cases is the rela- 
tive frequency of events of the kind in question within a reference class 
to which the event referred to in the bet belongs. Therefore, the statement 
quoted may be understood as saying that the bettor’s estimate of this 
relative frequency determines the betting quotient at which he is willing 
to make the bet. Thus it seems that Reichenbach is aware of the distinc- 
tion between the actual relative frequency in the future, which is unknown 
at present, and the estimate of it, and that he recognizes that it is the 
latter, not the former, which determines the bettor’s decision concerning 
a betting quotient. Up to this point our views agree. But now Reichenbach 
takes a step which marks the parting of our ways. After identifying prob- 
ability, in the sense of probability,, with relative frequency, he declares 
that weight, that is, probability,, must likewise be explicated by identify- 
ing it with relative frequency. It seems to me that it would be more in 
accord with Reichenbach’s own analysis if his concept of weight were 
identified instead with the estimate of relative frequency. If Reichenbach’s 
theory is modified in this one respect, our conceptions would agree in all 
fundamental points. 

Reichenbach criticizes'the logical concept of probability, that is, prob- 
ability., in the forms in which it has been proposed and systematized by 
Laplace and Keynes. It must be admitted that some of his objections are 
correct. However, Reichenbach cannot reject the concept of probability, 
in our interpretation either because of its alleged apriorism or for any 
other reason, because this concept, at least in certain cases of application, 
coincides with a concept used by Reichenbach himself, namely, the con- 
cept of an estimate of relative frequency. His own detailed and illuminat- 
ing discussions of the role of inductive thinking, both in science and in 
everyday life, make it clear how important a systematic theory of estima- 
tion and, in particular, of the estimation of relative frequency would be. 
In our conception this is one of the tasks of inductive logic. If Reichenbach 
were to add such an inductive theory of estimation to his theory of fre- 
quency, then, but not otherwise, his system would become complete. This 
follows from a consistent development of his own basic conception. 

Some philosophers believe that the logical concept of probability, super 
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sedes the concept of truth. They regard the latter concept as an illegiti- 
mate idealization; instead of saying that a given statement is true we 
should say more correctly that it is highly confirmed or highly probable. 
In a similar way Reichenbach ([Experience] §§ 22, 35) believes that the 
values of probability (the logical concept of probability,) ought to take 
the place of the two truth-values, truth and falsity, of ordinary logic, or, ` 
in other words, that probability logic is a multivalued logic superseding 
the custumary two-valued logic. I think that these views are based on a 
lack of distinction between ‘true’, on the one hand, and ‘known to be true’, 
‘absolutely certain’, ‘completely verified’, ‘confirmed to the maximum de- 
gree’, ‘having the probability, 1’, on the other. The concept expressed by 
the latter phrases in their strictest sense is indeed an absolutistic concept 
that should be replaced by the concept of probability, with its continuous 
scale of degrees. Both these concepts refer to given evidence; the concept 
of truth, however, does not and thus is seen to be of an entirely different 
nature, and, hence, values of probability, are fundamentally different from 
truth-values. Therefore, inductive logic, although it introduces the con- 
tinuous scale of probability, values, remains two-valued, like deductive 
logic. While it is true that to the multiplicity of probability, values in in- 
ductive logic only a dichotomy corresponds in deductive logic, neverthe- 
less this dichotomy is not between truth and falsity of a sentence but be- 
tween L-implication and non-L-implication for two sentences. If, for ex- 
ample, the probability, of %4 on e is 2/3, then =A is still either true or false 
and does not have an intermediate truth-value of 2/3. [For more detailed 
discussions on the relations and the distinctions between truth, verifica- 
tion, and probability, see [Concepts] § VI and [Remarks] § 3.] 


F. Presuppositions of Induction 


The concept of probability, and the concept of estimation based on 
probability, not only are of theoretical interest but are also essential for 
those deliberations which are intended to guide our practical decisions. 
We have discussed the relevance of probability, and of an estimate of 
relative frequency for judging whether a proposed bet is fair or not. Later 
we shall show in detail how values of probability, or of estimates of vari- 
ous magnitudes may be used in determining practical decisions (§§ 50, 51). 
Leaving the technical details of this procedure for the later discussion, we 
shall at present examine its validity and presuppositions. Let us assume 
that a man X generally decides his actions in accordance with the proba- 
bilities of relevant predictions with respect to the observational evidence 
available to him. Is this an arbitrary habit, or can we give a justification 
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for this general way of procedure? Can X be sure that his activities if de- 
termined in this way will be successful? 
Suppose that X would like to know whether the prediction 


(1) ‘It will rain tomorrow’ 


is true or false, because this is relevant for a practical decision he has to 
take now. Some reflection will show him that for questions of this kind 
certainty is not attainable but only probability. Thus he will be content 
to take the following statement (2) instead of (1) as a basis of his decision: 


(2) ‘With respect to the available evidence, the probability, that it 
will rain tomorrow is high’. 


This is all he can know’at the present moment. And it is sufficient as a 
basis of his decision. For example, he may decide to take his umbrella 
along; or, if the probability is numerically determined, say as 4/5, he 
may decide to make a bet with this value as the betting quotient. X is 
aware that he cannot be sure that the action thus determined will be suc- 
cessful. It may be that the event predicted with high probability will not 
occur. But is he perhaps right in expecting success in the average of a long 
series, though not in, each single case? He asks himself whether there are 
good reasons for accepting the following prediction: 


(3) ‘If X continues to make decisions with the help of the inductive 
method, that is to say, taking account of the values of probability, 
or estimation with respect to the available evidence, then he will 
be successful in the long run. More specifically, if X makes a suffi- 
ciently long series of bets, where the betting quotient is never higher 
than the probability, for the prediction in question, then the total 
balance for X will not be a loss.’ 


If X could know this, then he would clearly be justified in following the 
inductive method. It is clear that the truth of (3) is not logically necessary 
but depends on the contingency of facts. Statements like (3) which assert 
success in the long run for the inductive method would be true if the world 
as a whole had a certain character of uniformity to the effect, roughly 
speaking, that a kind of events which have occurred in the past very fre- 
quently under certain conditions. will under the same conditions occur 
very frequently in the future. Therefore many philosophers have asserted 
that the assumption of the uniformity of the world is a necessary presup- 
position for the validity of inductive inferences (probability inferences) 
and hence for the justification of applying the inductive method in the 
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determination of practical decisions. Among the many different formula- 
tions of this principle of uniformity, which are similar to each other but not 
necessarily logically equivalent, two may be given here: 


(4) ‘The degree of uniformity of the world is high.’ 

(5) ‘If the relative frequency of a property in a long initial segment of 
a series is high (say, r), then it will likewise be high (approximately 
equal to r) in a sufficiently long continuation of the series.’ 


We give to the principle of uniformity the form (4) rather than the cus- 
tomary one: ‘The world is uniform’, because it is preferable to use the con- 
cept of uniformity in a quantitative form instead of the usual classifica- 
tory form, as we shall see later. The questions as to whether the principle 
of uniformity is true and, if so, whether and how we can know it, has been 
much discussed by philosophers. There is no doubt that the principle is 
synthetic, that it makes a factual assertion about the world; it is con- 
ceivable that it is false, that is, that the world is chaotic or at least that 
it has a low degree of uniformity. Many philosophers maintain that the 
principle is fundamentally different from other factual hypotheses about 
the world, e.g., physical laws. The latter hypotheses can be empirically 
tested on the basis of observational evidence and thereby either con- 
firmed or disconfirmed inductively. But any attempt to confirm inductive- 
ly the principle of uniformity would contain a vicious circle, according to 
these philosophers, because the inductive method presupposes this prin- 
ciple. Some of these philosophers conclude that skepticism is the only 
tenable position: we have to reject the validity of inductive inference. 
Other philosophers maintain that we must abandon the principle of em- 
piricism which says that a synthetic statement can be accepted only if it 
is empirically confirmed. 

Are these conclusions actually inescapable? Let us examine what kind 
of assurance would justify X’s implicit habit or explicit general decision 
to determine all his specific decisions with the help of the inductive meth- 
od. We can easily see that he need not know with certainty that this pro- 
cedure will be successful in the long run; it would be sufficient for him to 
have the assurance that success.in the long run is probable. Just as in the 
case of the prediction of a single event it was clear that only probability 
but not certainty can be obtained and that probability gives a sufficient 
basis for the specific decision, thus analogously for the question of success 
in the long run it would suffice for X to obtain, instead of the earlier state- 
ment (3), an inductive statement either in terms of probability; like (6a) or 


in terms of an estimate like (6b): 
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(6a) ‘If X makes a long series of bets such that the betting quotient is 
never higher than the probability, for the prediction in question, 
then it is highly probable that the total balance for X will not be 
a loss.’ 

(6b) ‘If X makes a long series of bets as described, then the estimate of 
his total balance will not be negative.’ 


Tt seems that, indeed, many contemporary philosophers, perhaps the ma- 
jority, in contradistinction to those of the last century, agree that proba- 
bility of success in the long run would be sufficient for the validity of in- 
ductive inference. Accordingly, it is agreed that what is needed as a 
presupposition of the validity of the inductive method is not certainty 
of the uniformity of the world but only probability. Therefore we replace 
now the earlier statements (4) and (5) by corresponding inductive state- 
ments (7) and (8); we formulate each of them again in two alternative 
ways, in terms of a probability or an estimate: 


(7a) ‘On the basis of the available evidence it is very probable that the 
degree of uniformity of the world is high.’ 

(7b) ‘On the basis of the available evidence, the estimate of the degree 
of uniformity of the world is high.’ 

(8a) ‘On the basis of the evidence that the relative frequency of a prop- 
erty in a long initial segment of a series is high (say, r), it is very 
probable that it will likewise be high (approximately equal to r) in 
a long continuation of the series.’ 

(8b) ‘On the basis of the evidence described, the estimate of the relative 
frequency in a continuation of the series is likewise high (has a 
certain value near to r).’ i 


These are alternative formulations for the principle which is needed as a 
presupposition for the validity of the inductive method. This means that 
a demonstration or confirmation of this principle would constitute a 
justification for the inductive method. Some of those philosophers who 
agree that the principle need not assert uniformity with certainty but 
merely with probability believe nevertheless that the difficulty earlier de- 
scribed remains essentially the same. The statement of the probability 
of uniformity is regarded by them asa synthetic, factual statement (usual- 
ly interpreted in terms of the frequency concept probability,). But it can- 
not be confirmed empirically because such a procedure would use the 
method of induction which in turn presupposes the statement. Thus, they 
say, at this point empiricism must be sacrificed. This is, for instance, the 
conclusion to which Bertrand Russell comes in a detailed and thorough- 
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going analysis of the presuppositions of science ([Knowledge], chaps. v 
and vi). 

Our conception of the nature of inductive inference and inductive prob- 
ability leads to a different result. It enables us to regard the inductive 
method as valid without abandoning empiricism. According to our con- 
ception, the theory of induction is inductive logic. Any inductive state- 
ment (that is, not the hypothesis involved, but the statement of the in- 
ductive relation between the hypothesis and the evidence) is purely 
logical. Any statement on probability, or estimation is, if true, analytic. 
This holds also for the statements of the probability of uniformity or the 
estimate of uniformity ((7a) and (7b), and likewise (8a) and (8b)). Since 
they are not synthetic, no empirical confirmation is required. Thus the 
earlier difficulty disappears. The opponents would perhaps say that the 
statement of the probability of uniformity must be taken as a factual 
statement because otherwise X would have no assurance of success in the 
long run. Our reply is: it is not possible to give X an assurance of success 
even in the long run, but only of the probability of success, as in statement 
(6a); and this statement is itself analytic. But can X take a practical de- 
cision if he has as a basis merely an analytic statement, one that does not 
say anything about the world? In fact, X has as a basis for his decision two 
statements: first a factual statement of his total observational evidence, 
and second an analytic statement of probability.. The latter does not add 
anything to the factual content of the first, but it makes explicit an in- 
ductive-logical relation between the evidence and the hypothesis in ques- 
tion. In our earlier example this inductive statement has the form (2) for 
the hypothesis (1). Thus X learns from (2) that his evidence gives more 
support to the prediction of rain than to that of non-rain. Therefore it is 
reasonable for him to take suitable action; for example, to take his um- 
brella or to bet on rain rather than on non-rain. For a practical decision 
is reasonable if it is made according to the probabilities with respect to the 
available evidence, even if it turns out to be not successful. Going back to 
the general problem, it is reasonable for X to take the general decision of 
determining all his specific decisions with the help of the inductive meth- 
od, because the uniformity of the world is probable and therefore his suc- 
cess in the long run is probable on the basis of his evidence, even though he 
may find at the end of his life that he actually was not successful and that 
his competitor who made his decisions in accordance not with proba- 
bilities but with arbitrary whims was actually successful. 

Later (in Vol. II), after constructing a definition of degree of confirma- 
tion as an explicatum of probability, (compare § 110A) and, based upon 
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this, a definition of an estimate-function (compare § 100A), we shall show 
that inductive statements on the uniformity of the world of a kind similar 
to (8a) or (8b) are indeed analytic, because deductively provable on the 
basis of the definitions mentioned. We shall also propose definitions as 
tentative explications for the degree of uniformity and its opposite, the 
degree of randomness. This will make it possible to formulate and prove 
also statements similar to (7a) or (7b). The whole problem of the justifica- 
tion and the presuppositions of the inductive method, and in particular 

- of its application in the determination of practical decisions, will then be 
discussed in greater detail and in more exact, technical terms. What has 
been said here should be regarded merely as preliminary remarks in non- 
exact terms of the explicanda, intended to show in outline the direction 
in which we look for a solution of the problem. 


§ 42. Probability, and Probability, 


A. The word ‘probability’ had originally only the sense of probability.. It 
is no more than about a hundred years ago that some writers used it in the sense 
of probability.. This shift in sense was made inadvertently. It seems that the 
ambiguity of elliptical formulations of probability statements and a lack of 
distinction between frequency and an estimate of frequency played some part 
in the historical origin of the new sense. B. Many probability statements made 
by scientists, actuaries, and practical statisticians are based on statistical re- 
sults concerning observed frequencies and lead to expectations of certain fre- 
quencies in the future. An analysis of these statements shows that they can be 
interpreted not only as statements on probability, but also as statements on 
probability, with respect to statistical evidence (in the traditional terminology, 
probability statements “a posteriori”). 


A. The Shift in Meaning of the Word ‘Probability’ 


We have seen that the word ‘probability’ used in contemporary science 
has sometimes the meaning of probability, that is, degree of confirmation, 
and sometimes that of probability., that is, relative frequency. Thus the 
questions arise: what was the original meaning of the word, and how did 
it acquire a second meaning? 

The first question is easily answered. The etymology of the word ‘prob- 
able’ and corresponding words in other languages, e.g., German ‘wahr- 
scheinlich’, French ‘vraisemblable’, Latin ‘probabilis’ and ‘verisimilis’, 
shows clearly that these words were used originally in everyday speech 
for something that is not certain but may be expected to happen or pre- 
sumed to be the case. It is easily seen how this common use led to the 
similar but somewhat more specific use in early books on probability, 
where the term ‘probability’ was meant in the sense of ‘evidential support 
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for an assumption (or event)’ or ‘rational credibility of an assumption’, 
and, more specifically, as ‘numerical degree of this support or credibility’. 
In other words, the word ‘probability’ had the sense of what we have 
called probability,. Its use in the sense of probability, is of relatively re- 
cent date; it goes back not more than about a hundred years. The develop- 
ment of this new meaning out of the older one can be made understand- 
able from each of two different points of view, referring to two different 
situations in which the word in its older sense was used. We shall now ana- 
lyze both of them in turn. 

Let us begin with the assumption that, within a certain group of scien- 
tific writers about the middle of the last century, the word ‘probability’ 
was commonly used in the sense of probability,. It was more or less clear 
that it was applicable to an unknown event or hypothesis with respect to 
a given body of evidence, although the customary formulations often 
omitted explicit reference to this evidence. Now let us consider the case 
in which the evidence gives statistical information concerning a certain 
population and, in particular, states the relative frequency of a certain 
property M within the population, and the hypothesis is the assumption 
that an individual, whose characteristics are unknown except that he be- 
longs to the population, has the property M. (This will later be called a 
case of the direct inductive inference, § 44B.) As an example, suppose 
that an observer X has the following knowledge: 


(1) ‘The relative frequency of myopia among the inhabitants of Chi- 
cago is 1/5’, 

and considers the hypothesis: 

(2) ‘John Doe is myopic’, 
where ‘John Doe’ is defined as ‘the inhabitant No. 117 of Chicago’ so that 
the statement ‘John Doe is an inhabitant of Chicago’ is analytic. If now 
X wanted to make a statement concerning the probability, in the sense of 
probability, of the assumption (2), a complete formulation would have to 
be of the following form: 

(3) ‘The probability of (2) with respect to (1) is 1/5’. 
The numerical value of the probability is in this case equal to the known 
relative frequency. This equality was generally assumed on the basis of 
the classical conception of probability, and our theory will lead to the 
same result (To4-re). However, complete formulations like (3) were sel- 
dom used before Keynes, as mentioned earlier (§ 10A). X, as a man of the 
last century, was apt to use instead the following elliptical formulation: 
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(4) ‘The probability that John Doe is myopic is 1/5’. 

X was, of course, aware that this probability statement had something to 
do with the frequency statement (1). But he did not clearly recognize that 
the frequency statement ought to be an essential part of the probability 
statement; he regarded it merely as the ground, the given knowledge, 
from which he had derived the latter. Therefore, if he felt instinctively the 
need of referring to the frequency in connection with the probability, he 
might do it in a form like this: 


(5) ‘Since the relative frequency of myopia among inhabitants of Chi- 
cago is 1/5, the probability that John Doe is myopic is 1/5’. 


He might also assert a generalized statement in conditional form: 


(6) ‘If the relative frequency of a property M in a population K is Q, 
then the probability of an element of K being M is q’, 


and the following as a substitution instance of it: 


(7) ‘If the relative frequency of myopia among inhabitants of Chicago 
is 1/5, the probability that John Doe is myopic is 1/s.’ 


As explained earlier (§ 10A), conditional formulations of this kind, al- 
though not quite correct and sometimes misleading, were quite customary; 
in particular, (6) has the customary form of a general theorem in the tra- 
ditional theory of probability. Therefore X would regard (6) as analytic, 
and likewise (7), since it is an instance of (6). Furthermore, since in the 
case of a relative frequency different from 1/5 the probability would like- 
wise be different from 1/5, X might regard the converse of (7) likewise as 

` true and analytic. This would naturally lead him to the belief that the 
two components in (7), that is, (1) and (4), were logically equivalent. Thus 
it becomes understandable that X, when he wished to communicate the 
statistical fact (1) concerning the relative frequency of M, would use the 
formulation: 


(8) ‘The probability of an inhabitant of Chicago being M is 1/ Iie 


which seemed to him to follow from (1) and indeed to have the same mean- 
ing. In this way, the word ‘probability’ became for him synonymous 
with ‘relative frequency (in the whole population)’ and hence acquired 
the sense of probability,. 

In the second situation to be considered now the evidence e describes 
an observed sample taken from a population K and the hypothesis k says 
that an unobserved element of K has the property M. (A case of this kind 
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will be called a singular predictive inference, § 44B.) Suppose X finds the 
following result: 


(9) ‘The probability, of 4 with respect to e is 1/3.’ 


As we have seen previously (§ 41D (8)), this statement is logically equiva- 
lent to the following: 

(10) ‘The estimate of the relative frequency of M in any unobserved 
class, and hence also in the whole unobserved part of K, on the 
evidence e is 1/3.’ 

Although earlier authors did not state explicitly the equivalence of (9) 
to (10) and presumably were not aware of it with full clarity, it seems that 
they, nevertheless, felt this connection more or less instinctively. This is 
shown by the fact that they often made a transition from a statement on 
probability to one on an expected relative frequency in a form like this: 
‘The probability of an individual’s being M is 1/3; therefore we may ex- 
pect to find among future cases one-third who will exhibit the property M.’ 
The phrase ‘we may expect to find’ is rather ambiguous. As explained 
earlier (§ 41D), the author is right if the phrase is meant to refer to an 
estimate but wrong if it expresses a prediction. Now it seems that some- 
times a writer was not quite clear in his own mind whether he meant to 
state an estimate or a prediction of the future relative frequency. In a 
case of this kind it may happen that a statement containing the word ‘prob- 
ability’ is first meant in the traditional sense, that is, probability, then 
correctly interpreted as stating an estimate of relative frequency, and, 
finally, due to a lack of distinction between an estimate and a predicted 
value, acquires a new interpretation as a factual statement on the future 
relative frequency; in other words, ‘probability’ is inadvertently shifted 
from the old sense of probability, to the new sense of probability,. 

Thus the transition from the old conception of probability to the newer 
one is sometimes concealed by ambiguous formulations, as a picture on a 
screen blurs over into a new one so that it is impossible to mark a clear-cut 
point at which the change occurs. It seems to me that this is exemplified 
in certain formulations by Leslie Ellis, which Keynes regards as the first 
appearance of the frequency conception of probability ((Probab.], pp. 
92'f.). In a paper read in 1842 (hence before the appearance of Cour- 
not’s work to be mentioned soon) and published in 1844 (not, as Keynes 
Says, 1843), Ellis says: “If the probability of a given event be correctly 
determined, the event will on a long run of trials tend to recur with fre- 
quency proportional to their [sic] probability. This is generally proved 
mathematically. It seems to me to be true a priori... . I have been un- 


186 IV. THE PROBLEM OF INDUCTIVE LOGIC 


able to sever the judgment that one event is more likely to happen than 
another from the belief that in the long run it will occur more frequently” 
([Foundations], pp. x f.). The phrase “the belief that” shows the typical 
ambiguity earlier discussed. It can be interpreted as an unclear reference 
to an estimate, but also as a formulation, unnecessarily psychologistic, of 
a plain prediction of relative frequency. The phrase “will tend to recur 
with frequency . . .” in the first sentence quoted is likewise ambiguous. 
Presumably it is not meant in the sense of “will recur. . .”, since that 
would obviously be false. More likely is it to be interpreted in the sense 
that the specified frequency has a high probability, hence as a loose for- 
mulation of Bernoulli’s theorem; this interpretation seems confirmed by 
the subsequent sentence: “This is generally proved mathematically”, 
The phrase “true @ priori” in the third sentence means probably “imme- 
diately following from a definition”. The whole quotation and later pas- 
sages of a similar nature (op. cit., p. 3) give the impression that Ellis felt 
that there is some relation between probability and relative frequency 
without being able to make it clear to himself whether a probability value 
q means an estimate g of relative frequency, or a high probability of a 
relative frequency q, or simply a relative frequency q. His reflections are 
perhaps to be regarded as the historically first step in the transition of the 
meaning of the word ‘probability’ from probability, to probability., and 
we see that this first step was made in—and was perhaps psychologically 
due to—a foggy state of mind characterized by a lack of distinction be- 
tween various closely related but nonidentical concepts. 

The ntext step was made by A. Cournot ([Exposition] [1843]). He like- 
wise combines the classical definition of probability, hence probability, 
with an interpretation in terms of relative frequency ([Exposition], p. iii, 
quoted by Keynes [Probab.], p. 92 n.) without being aware of their in- 
compatibility. [It seems to me that George Boole ([Laws] [1854]) cannot 
be regarded as a representative of the frequency conception as is some- 
times done. It is true that on a few occasions he makes indications toward 
a frequency interpretation. But they are not meant as a general definition 
of probability. The basic concept used throughout most of his systematic 
developments is unmistakably probability,.] 

John Venn ([Logic] [1866]), more than twenty years after Cournot, was 
the first to advocate the frequency concept of probability, unambiguously 
and systematically as explicandum and also the first to propose as expli- 
catum for it the concept of the limit of relative frequency in an infinite 
series. Although his conception influenced the views of some other writers, 
among them Charles Sanders Peirce (1878), it was only half a century 
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later that comprehensive systematic theories were constructed which 
took probability, as their basis. This was done, on the one hand, by Hans 
Reichenbach and Richard von Mises and, on the other hand, by R. A. 
Fisher and subsequently by the majority of contemporary mathematical 
statisticians. [Reichenbach used the frequency concept first in [Begriff] 
(1915), the limit concept in [Kausalitat] (1930); the systematic construc- 
tion of the theory was given in [Axiomatik] (1932) and further developed 
in [Wahrsch.] (1935). Mises defined probability as the limit of relative 
frequency first in [Grundlagen] (1919); the systematic development of his 
whole theory was given in [Wahrsch.] (1931). Fisher constructed the foun- 
dations of his theory in [Foundations] (1922) and developed it in numerous 
further publications.] 

It is surprising to see that hardly any one of these representatives of 
the frequency conception, beginning with Venn, seems to be aware of the 
fundamental change that has taken place in the meaning of the word ‘prob- 
ability’. It is true that they criticize Bayes, Laplace, and other classical 
and later authors. But they seem to believe that their new conception in- 
volves merely a modification and sometimes a rejection of certain asser- 
tions, theorems or rules, concerning probability made by the earlier writ- 
ers, due to the chojce of an improved explicatum. They do not seem to 
recognize that the explicandum itself has been changed and that, conse- 
quently, their theories deal with a subject matter entirely different from 
that of the earlier writers. This may be due, at least partly, to the fact 
explained above that the first steps in the transition from probability, to 
probability, were involved in ambiguities and confusions. It should be 
noticed that the criticism just made is by no means directed against the 
frequency theories themselves. These theories are of great importance for 
statistical work and therefore for the whole of science. Our remarks are 

_ only intended to point out the historical fact that the basic concept and 
the problems of the classical theory of probability differ in a more funda- 
mental sense from these theories than is usually recognized. 


B. On the Interpretation of Given Probability Statements 


The previous discussion on probability, and, in particular, its explana- 
tion as an estimate of frequency (§ 41D), and, further, the remarks just 
made concerning the historical development leading from probability, to 
probability., also make it clear that the concept of probability, is closely 
connected with the concept of frequency. Therefore, it is often not easy 
to discover whether a given statement on probability is to be interpreted 
in terms of probability, or probability,. In the following we shall analyze 
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certain probability statements which involve frequency and therefore 
may appear at first as statements on probability., but we shall find that 
they may be interpreted as dealing with probability. 

Many writers since the classical period have said of certain probability 
statements that they are ‘based on frequencies’ or ‘derived from fre- 
quencies’. Nevertheless, these statements often, and practically always if 
made before the time of Venn, speak of probability,, not of probability, 
In our terminology they are probability, statements referring to an evi- 
dence involving frequencies. We have explained earlier (§ roA) that in 
cases of this kind the frequency statement is not a premise of the proba- 
bility statement but part of its.subject matter, and hence the customary 
phrase ‘derived from frequencies’ is misleading. It would be more correct 
to say that in these cases the probability is determined with the help of 
a given frequency and its value is either equal or close to that of the fre- 
quency. The frequency stated in the evidence may either refer to the whole 
population or to an observed sample. (As mentioned above, we speak in 
the first case of a direct probability or a direct inductive inference, in the 
second case of a predictive one.) In the traditional terminology the prob- 
ability in the second case was often called a ‘probability a posteriori’, in 
distinction to a ‘probability a priori’. The latter term was used in cases 
where the evidence did not state a frequency but was very weak or even 
tautological (a ‘statement of ignorance’) and the value of the probability 
was determined: chiefly by the use of the principle of indifference. Con- 
sider, for example, a statement to the effect that the probability of throw- 
ing an ace with a given die is 1/6. If the evidence, which was usually not 
referred to explicitly in the probability statement but merely indicated 
by the description of the situation, said only that the die had the shape of 
a regular cube, the statement would be said to give a probability a priori. 


If, on the other hand, the evidence described the results of six thousand - 


throws made with the die and stated that one thousand of them were aces, 
the probability was called a posteriori. Thus, even in the latter case, the 
concept of probability involved is probability,, not probability,, although 
its value is determined on the basis of a frequency. It is important to 
notice this fact, because some writers have regarded the use of probability 
a posteriori as an indication of the frequency conception. That this use 
is actually still a case of probability, is clearly seen from the general de- 
scription of the two methods by Bernoulli, who introduced the terms 
‘a priori’ and ‘a posteriori’ for them ([Ars], Part IV, chap. iv). Neverthe- 
less, the fact that in the course of the last century the use of the principle 
of indifference was more and more regarded with suspicion and, conse- 
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quently, the use of probability a posteriori was more emphasized, was pre- 
sumably one of the psychological factors which helped in preparing the 
way for the frequency conception. 

We have earlier (§ 9) taken the statement ‘The probability of throwing 
an ace with this die is 1/6’ as a typical example of probability.. The pre- 
ceding discussion shows, however, that the same statement may also be 
interpreted as referring to probability,. In order to discover which interpre- 
tation the person X who makes the statement has actually in mind, we 
have to take into consideration the context of the statement and the 
use X makes of it. Let us analyze the situation somewhat more in detail; 
we shall find that certain circumstances which frequentists might be in- 
clined to regard as indicative of probability, do not in fact preclude an 
interpretation in terms of probability,;. Let us consider a modified ex- 
ample of an irregular or loaded die with a probability different from 1/6. 
The frequentists have pointed out, correctly, that in this case the classical 
definition of probability in terms of possible and favorable cases is not 
applicable, at least not without rather artificial constructions; from this 
they have inferred the conclusion, which we shall question, that in this 
case only the concept of probability, is applicable. Suppose X asserts the 
following statement: 


_ (13) ‘The probability of throwing an ace with this die is 0.15.’ 


We want to determine in which sense this statement is meant by X. Here, 
as often, it is not advisable to ask direct questions like ‘What do you 
mean?’ or ‘Which meaning does the word ‘probability’ have for you?’ We 
ask instead: ‘What is the basis of your assertion? What observations led 
you to the value stated?’ The frequentists emphasize the fact that a prob- 
ability statement in their sense is not obtained by a merely logicoarith- 
methical procedure like counting possible and favorable cases but by sta- 
tistical observations. Therefore, in order to fit our example to this concep- 
tion, let us assume that X answers as follows: 


(14) ‘Ihave made 1,000 throws with this die, of which 150 yielded an 
ace; no other results of throws with this die are known to me.’ 


The frequentists will be inclined to take this answer as indicating an in- 
terpretation of the original statement (13) in the sense of probability. It 
is true that this interpretation is possible, but it is not the only one pos- 
sible. We may try to clarify the situation by asking X to state more ex- 
plicitly the connection between (14) and (13) as he sees it. Suppose he 
replies as follows: 


190 IV. THE PROBLEM OF INDUCTIVE LOGIC 


(15) ‘Since there were 150 aces among the observed 1,000 throws, the 
probability of an ace is 0.15.’ 

He may even add: ‘This will be obvious for anyone who uses the word 
‘probability’ in the same sense as I do.’ But this sense is still not unam- 
biguously determined by (15). It is true, this statement may suggest the 
sense of probability.. However, it is also possible that it is meant by X in 
the traditional sense of a probability a posteriori, that is, in the following 
sense: 


(x6) ‘The probability, of the assumption that a future throw with this 
die will yield an ace with respect to the evidence (14) is 0.15.’ 


[The use of a formulation like (15) in the sense of (16) is customary but 
not quite correct; see the above discussion of ($); the present situation is 
analogous to the earlier one but slightly different because it involves a 
predictive probability, not a direct one.] 

Since probability, means relative frequency in the long run, let us for- 
mulate a statement concerning the future frequency: 


(17) ‘The relative frequency of aces among future throws of this die 
in the long run will be 0.15’, 


and then let us ask X for his judgment about this prediction from the 
point of view of his original statement (13) and the observational report 
(14); perhaps his answer will reveal whether his probability statement 
(13) was meant in the sense of probability,. We may assume that his an- 
swer will be somewhat like this: 


(18) ‘It is not possible, of course, to make predictions with certainty; 
but, in view of the observational report (14), it seems sensible to 
expect a frequency of about the value 0.15 predicted in (17).’ 


A frequentist might now argue that by this answer X has accepted the 
statement (17) and, since this is a statement of probability, X has hereby 
shown that also his original statement (13) was meant in the sense of 
probability,. Against this argument it must be pointed out that X did 
not in (18) accept (17) as an outright prediction but rather as a reasonable 
expectation. It seems more adequate to interpret this as an inductive 
statement. [In Reichenbach’s terminology, (18) would be said to express 
an anticipation of a relative frequency as a “posit” ([Experience], p. 352). 
It seems to me that here again Reichenbach introduces, apparently with- 
out being aware of it, concepts which belong to inductive logic in our sense 
and hence can be based only on probability., not on probability,.] In par- 
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ticular, (18) may be interpreted in either one of the following two senses 
(19) or (20): 

(19) ‘There is a high probability, with respect to the evidence (14) for 
the prediction that the relative frequency of aces in a long series 
of future throws with this die will lie within an interval around 
ots 

(20) ‘The estimate of the relative frequency of aces in a series of future 
throws with this die with respect to the evidence (14) is 0.15.’ 


Both (19) and (20) are statements of inductive logic. The latter is, ac- 
cording to our earlier explanations (§ 41D (8)), logically equivalent to 
(16); therefore, it would suggest an interpretation of the original state- 
ment (13) in the sense of probability. : 

The result of our analysis of the simple probability statement (13) 
holds, of course, likewise for any other probability statement based on 
statistical evidence and leading to expectations concerning certain future 
relative frequencies. Thus it holds, for example, for the statements of a 
physicist concerning the probability that the velocity of a molecule in a 
given body of gas belongs to a certain value region, or concerning the prob- 
ability that the number of a-particles emitted by a given radioactive body 
during the next hour lies in a certain interval, or the statement of an actu- 
ary concerning the probability of death within the next year for a fifty- 
year-old steel worker in Chicago. Any statement of this kind can be ex- 
plicated in two different ways; either (i) in the sense that the relative 
frequency in the long run, in other words, probability., is g, or (ii) in the 
sense that the probability; of a single instance of the kind in question with 
respect to given statistical evidence, e.g., an observed relative frequency, 
is q. Both reformulations contain the same numerical value q. Most of 
those scientists who have not made a special study of the problems of 
probability, and hence have not become partisans either of the Keynes- 
Jeffreys school of probability, or of the frequency school of probability., 
will perhaps refuse to tie themselves down to one of the two interpreta- 
tions; they will perhaps regard the distinction as of merely academic in- 
terest. In a certain sense they are right. There is not much difference be- 
tween the practical consequences drawn from (i) or (ii), since, as we have 
seen earlier, (ii) means the same as the statement that the estimate of 
relative frequency is g. Therefore, the scientist will proceed in either case 
in certain respects as if he knew that the relative frequency will be q. 
There is, however, the following difference. In the case (i) the statement 
in question is complete and has factual content, while in the case (ii) it 
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is elliptical and analytic, expressing a logical relation between two factual 
statements. Consequently, there will be a difference concerning the future 

` procedure in the following respect, if further observations exhibit a value 
of the relative frequency deviating considerably from g. The statement in 
sense (i) is rejected as probably false; the statement in sense (ii) , however, 
remains valid but becomes irrelevant for practical purposes and is re- 
placed by a new, likewise analytic, statement referring to the increased 
evidence. 


§ 43. Inductive and Deductive Logic 


A. Can a system of inductive logic as a theory of the degree of confirmation 
contain exact rules? This is sometimes denied for the reason that the procedure 
of induction is not rational but intuitive. Now it must be admitted that there is 
no effective procedure for finding a suitable hypothesis h for the explanation of 
a given observational report e, nor, if a hypothesis h is proposed, for determin- 
ing c(/,e). However, this is no reason against the possibility of an inductive 
logic because in deductive logic there is likewise no effective procedure for the 
solution of the corresponding problems. On the other hand, there are effective 
procedures for testing whether an alleged proof for a logical theorem is correct, 
e.g., in deductive logic for a theorem of the form ʻe L-implies #’, and in induc- 
tive logic for a theorem of the form ‘c(h,e) = r’. 

B. Inductive logic is constructed from deductive logic by the adjunction of a 
definition of c. Hence inductive logic presupposes deductive logic. The analogy 
between these two fields of logic is illustrated by examples both for purely logi- 
cal statements and for those involving the application to knowledge situations. 
However, truth and knowledge of the evidence e, although relevant for these ap- 
plications, are irrelevant for the validity of the statements in inductive logic, as 
for those in deductive logic. 


A. On the Possibility of Exact Rules of Induction 


The question whether an inductive logic with exact rules is at all pos- 
sible is still controversial. But in one point the present opinions of most 
philosophers and scientists seem to agree, namely, that the inductive 
procedure is not, so to speak, a mechanical procedure prescribed by fixed 
rules. If, for instance, a report of observational results is given, and we 
want to find a hypothesis which is well confirmed and furnishes a good 
explanation for the events observed, then there is no set of fixed rules 
which would lead us automatically to the best hypothesis or even a good 
one. It is a matter of ingenuity and luck for the scientist to hit upon a 
suitable hypothesis; and, if he finds one, he can never be certain whether 
there might not be another hypothesis which would fit the observed facts 
still better even before any new observations are made. This point, the 
impossibility of an automatic inductive procedure, has been especially 
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emphasized, among others, by Karl Popper ((Logik] §§ 1-3 and else- 
where), who also quotes a statement by Einstein: “There is no logical 
way leading to these . . . laws, but only the intuition based upon a sym- 
pathetic understanding of experience” (“. . . die auf Einfiihlung in die 
Erfahrung sich stiitzende Intuition”) (Mein Weltbild [1934], p. 168); 
compare also Einstein, On the Method of Theoretical Physics (Oxford, 
1933), pages 11-12. The same point has sometimes been formulated by 
saying that it is not possible to construct an inductive machine. The latter 
is presumably meant as a mechanical contrivance which, when fed an 
observational report, would furnish a suitable hypothesis, just as a com- 
puting machine when supplied with two factors furnishes their product. 
I am completely in agreement that an inductive machine of this kind is 
not possible. However, I think we must be careful not to draw too far- 
reaching negative consequences from this fact. I do not believe that this 
fact excludes the possibility of a system of inductive logic with exact 
rules or the possibility of an inductive machine with a different, more 
limited, aim. It seems to me that, in this respect, the situation in inductive 
logic is similar to that in deductive logic. This will become clear by a 
comparison of the tasks of these two parts of logic. 

When considering the kinds of problems dealt with in any branch of 
logic, deductive or inductive, one distinction is of fundamental impor- 
tance. For some problems there is an effective procedure of solution, but 
for others there can be no such procedure. A procedure is called effective 
if it is based on rules which determine uniquely each step of the procedure 
and if in every case of application the procedure leads to the solution in a 
finite number of steps. A procedure of decision (‘Entscheidungsverfahren’) 
for a class of sentences is an effective procedure either, in semantics, for 
determining for any sentence of that class whether it is true or not (the 
procedure is usually applied to L-determinate sentences and hence the 
question is whether the sentence is L-true or L-false), or, in syntax, for 
determining for any sentence of that class whether it is provable in a given 
calculus (cf. Hilbert and Bernays [Grundlagen], Vol. II, § 3). A concept 


- is called effective or definite if there is a procedure of decision for any given 


case of its application (Carnap [Syntax] § 15; [Formalization] § 29). An 
effective arithmetical function is also called computable (A. M. Turing, 


Proc. London Math. Soc., Vol. 42 [1937])- 

Now let us compare the chief kinds of problems to be solved in deduc- 
tive logic and in inductive logic. Our aim is to discover whether inductive 
procedures are less regulated by exact rules than deductive procedures, as 
some philosophers believe. 
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In order to simplify the comparison, let us regard deductive logic, in- 
cluding mathematics, as the theory of L-implication, the explicatum for 
logical entailment (§ 20), and inductive logic as the theory of degree of 
confirmation, the quantitative explicatum of probability,. At this stage in 
our discussions we do not yet know whether it is possible to find an ade- 
quate quantitative explicatum for probability,. Therefore the following 
explanations are meant at present merely in a hypothetical sense: if there 
is an adequate explicatum c and hence a quantitative inductive logic as its 
theory, what is its nature in comparison with deductive logic? 
In each of the two branches of logic we may distinguish three kinds of 
fundamental problems concerning the application of the fundamental 
concepts, viz.; L-implication or c, respectively. 


I. First Problem: To Find a Conclusion 

a. Deductive logic. Given: a sentence e as a premise (it may be a con- 
junction of a set of premises); wanted: a conclusion h L-implied by e and 
suitable for a certain purpose. For instance, a set of axioms for geometry 
is given; theorems concerning certain configurations are wanted. The es- 
sential point is the fact that there is no effective procedure for the solution 
of problems of this kind. The work of a logician or a mathematician con- 
sists to a great extent in attempts to solve problems of this kind. Some 
laymen imagine a mathematician to be chiefly occupied with computa- 
tion, though of a sort more complicated than computation in elementary 
arithmetic. In fact, however, there is a difference in principle, not only in 
degree of complexity, between the two kinds of activities. To find the 
product of 15 and 17 is a simple task; to compute the square root of 7 to 
five decimals is more complicated; to compute the value of a number de- 
fined by a definite integral, e.g., e or x, to five decimals is still more com- 
plicated, All these tasks of computation, however, are fundamentally of 
the same nature, irrespective of the degree of complexity; for all of them 
there is an effective procedure; and this is characteristic of computation. 
The mathematician, on the other hand, cannot find fruitful and inter- 
esting new theorems, say, in geometry, in algebra, in the infinitesimal cal- 
culus, by computation or by any other effective procedure. He has to find 
them by an activity in which rational and intuitive factors are combined. 
This activity is not guided by fixed rules; it requires a creative ability, 
which is not required in computation. 

b. Inductive logic. Given: a sentence e as evidence ; wanted: a hypothesis 
h which is highly confirmed by the evidence e and suitable for a certain pur- 
pose. For instance, a report concerning observations of certain phenomena 
on the surface of the sun is given; a hypothesis concerning the physical state 
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of the sun is wanted which, in combination with accepted physical laws, 
furnishes a satisfactory explanation for the observed facts. Or, a historical 
report about some acts of Napoleon is given; a hypothesis concerning his 
character, his knowledge at the time in question, and his conscious and 
unconscious motives is wanted which would make his acts understandable. 
There is no effective procedure for solving these problems; that is the 
point emphasized by Einstein and Popper, as mentioned above. However, 
we see now that this feature is by no means characteristic of inductive 
thinking; it holds in just the same way for the corresponding deductive 
problems. 


II. Second Problem: To Examine a Result 

a. Deductive logic. Given: two sentences e and k; wanted: an answer to 
the question whether e L-implies 4. For instance, on the basis of an axiom 
set e of geometry, a mathematician finds, as a conjecture, an interesting 
sentence + concerning the angles of a triangle; this constitutes a tentative 
solution of a problem of the first kind; now he wants to find out whether k 
is actually deducible from e. Here, again, there is, in general, no effective 
procedure; in other words, L-implication is, in general, not an effective 
concept. Problems of this kind are again an essential part of any work in 
logic and mathematics. They are closely connected with problems of the 
first kind; for when a mathematician has found a theorem, he wants to 
give an exact proof for it so as to compel the assent of others. Finding a 
theorem is largely a matter of extrarational factors, not guided by rules. 
Constructing a proof is often called a rational procedure because here 
fixed rules have to be taken into consideration. However, the decisive 
point must not be overlooked: the rules of deduction are not rules of pre- 
scription, but rules of permission and of prohibition. That is to say, the 
rules do not tell the logician X which step to take at a given point in the 
course of a deduction; in other words, they do not constitute an effective 
procedure. The rules tell X merely which steps are permitted and thereby 
they say implicitly that all other steps are prohibited; they leave it to X 
to choose one of the steps permitted. Thus, here again, it depends upon 
X’s ingenuity and luck whether he solves the problem, that is, whether he 
finds a series of steps permitted by the rules, such that they lead from 


etoh. 
More specifically, 
part of logic, in propo 


the situation is this. Only in the most elementary 
sitional logic (see above, § 21) is there a general 
method of decision, viz., the customary method of truth-tables (see 
§ 21B). As soon as we enter the next higher field of logic, the so-called 
lower functional logic as represented, for instance, by our language sys- 
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tem € (§§ 15 ff.), there cannot be a method of decision for all sentences. 
[This has been shown by Alonzo Church; see Amer. Journal of Math., 58 
(1936), 345, and Journal of Symbolic Logic, t (1936), 40.] This holds a 
fortiori in the higher parts of logic, including arithmetic and the higher 
branches of mathematics. This does not exclude the possibility of methods 
of decision restricted to special kinds of sentences; and indeed several 
such methods for certain kinds within lower functional logic have been 
developed and are used as helpful instruments. 

b. Inductive logic. Here, the problems of the second kind occur in two 
different forms, because here we are concerned not only with two sentences 
but, in addition, with a third item, a number. (i) Given: two sentences e 
and k; wanted: the value of c(/,e), i.e., the degree of confirmation of h 
on the evidence e. (ii) Given: two sentences e and k and a number r; 
wanted: an answer to the question whether c(#,e) = r. For instance, a 
physicist has found, as a conjecture, a hypothesis # which he believes to 
be a good explanation for the results e of certain experiments; this is his 
solution, intuitively found, of a problem of the first kind; now he wants to 
find out whether 4 is indeed highly confirmed by e and, more precisely, (i) 
what is the value of c(h,e); or, if he has made the guess that this value 
isr, he wants to find out (ii) whether indeed c(,e) = r. There is, in gener- 
al, no effective procedure for these problems; in other words, ¢ is, in gen- 
eral, not a computable function. This does not exclude the existence of 
methods of computation for ¢ in restricted classes. We shall later, in our 
system of quantitative inductive logic, give such methods for the follow- 
ing cases: (1) for all cases where %4 and e are molecular sentences in any 
system £, (2) for all cases where / and e are sentences of any form, molecu- 
lar or general, in any finite system fy, (3) for certain cases in a system t% 
i.e., an infinite system containing only primitive predicates of degree one, 
§ 31). More methods of this kind could be found for other restricted 
classes of cases. However, no general method of computation for ¢ is pos- 
sible with respect to an infinite system 2. which contains also relations; 
because such a method would immediately yield a method of decision for 
all sentences of this system, which is known to be impossible, as stated 
under (a). Thus, if e and do not belong to one of the classes for which 
a method of computation exists and is known, the inductive logician X 
who wants to determine the value of c(%,e) cannot simply follow a way 
prescribed by fixed rules, but just has to try to hit upon a way to a solution 
by his skill and good luck. This, however, is not a peculiar feature of in- 
ductive logic but holds in just the same way for deductive logic, as we 
have seen. 
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Thus it is true that an inductive machine is impossible for finding a 
suitable hypothesis (first problem) and also for examining whether a given 
hypothesis is suitable (second problem). But, then, a deductive machine 
is likewise impossible if it is intended to solve the corresponding deductive 
problems of finding a suitable L-implied theorem or of examining whether 
a proposed theorem is indeed L-implied. However, for a restricted domain 
as described above, an inductive machine for the determination of c(4,e) 
is possible, for example, for all cases in which e and / do not contain 
variables with an infinite range of values; just as a deductive machine is 
possible which decides whether or not e L-implies 4. 


III. Third Problem: To Examine a Given Proof 

a. Deductive logic. Given: e, k, and an alleged proof that e L-implies h; 
wanted: an answer to the question whether the alleged proof is actually 
a proof, that is, whether it is in accordance with the rules of deductive 
logic. For instance, a mathematician believes to have not only a solution 
of the first problem, for instance, a geometrical theorem /, but also a 
solution of the second problem, a proof that the axiom set e L-implies the 
theorem k; he wants to make sure that his belief is right, that is, that the 
proof is correct. For the solution of this problem there is an effective 
procedure, provided the proof is given completely. We have to distinguish 
here two different methods which are in customary use for proving that 
e L-implies +. (i) The first method consists in the construction of a se- 
quence of sentences in the object language, leading from e to / in accord- 
ance with rules of deduction. (ii) The second method consists in a proof 
in the metalanguage, leading to the semantical statement ‘e L-implies X’. 
Strictly speaking, an effective method for testing proofs can only be ap- 
plied if a set of deductive rules has been laid down and if the proof to be 
tested is formulated in such a detailed form that every step in it consists 
in a single application of one of the rules. This condition is not often ful- 
filled in method (i) and almost never in method (ii). The method for test- 
ing proofs, as they are usually formulated, is not effective in the strictest 
sense. However, we may say that it is practically effective in the following 
sense. Suppose a mathematician shows, by either method (i) or method 
(ii), that the theorem / is deducible from the geometrical axioms e; and 
suppose he uses in his proof, as is customary in geometry, the ordinary 
word language without explicit rules of deduction. Then we know what 
we have to do in order to examine the correctness of the proof. We ex- 
amine for every single step in the proof whether it is an instance of a 
simple deductive procedure which we know to be valid. The mathemati- 
cian has made the steps in such a way that he expects us to be able to 
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carry out this examination for every step and to come to an affirmative 
result. If he has not overestimated our ability to recognize instances of 
L-implication, we shall affirm step for step and thereby recognize the 
whole proof as correct. Otherwise we have to ask him to split up the step 
which we are unable to judge into more and simpler steps, for which we 
are able to decide the question of correctness. Thus, in this examination of 
the proof, we are not entirely left to guessing, to a trial-and-error method 
as in problems of the first and second kind; instead, we know practically 
how to proceed and we expect that, under normal conditions, we shall 
reach a result in a finite number of operations, viz., the examinations of 
the steps of the given proof. In this sense we may say that we have a 
practically effective method. The result may also be formulated in this 
way: while L-implication is not an effective concept, the concept of proof 
for L-implication is effective, at least practically. 


The situation may be described more in detail as follows. A method of the 
kind (i) is usually applied in syntax with respect to a calculus K; here the rules 
constitute a definition of ‘direct C-implicate (directly derivable) in K?’ (see, e.g., 
[Semantics] §§ 26-28). Now it is possible, although not customary, to apply an 
exactly analogous method in semantics, with respect to a semantical system S. 
Essentially the same rules are here formulated as definition of ‘direct L-impli- 
cate in S’. [Instead of constructing a chain leading from the premise e to 4 
(called a derivation in the technical sense) one may also construct a chain with- 
out a premise leading to'e D h (called a derivation with the null class of prem- 
ises or a proof in the technical sense; see [Semantics] § 26, formulation B); the 
difference is merely a technical one, the result is the same (for languages with- 
out free variables in sentences), see T20-1b.] Even if this method is used in a 
symbolic language for which explicit rules of deduction have been laid down, 
the proofs are rarely given in a complete form. They usually proceed by larger 
steps, such that each step consists of several applications of the rules and hence 
would be divided into several steps in a complete formulation. This abbre- 
viated formulation is, of course, convenient and even necessary in order to avoid 
enormous length of the proofs. In many cases, the object language used in 
method (i) is the ordinary word language (supplemented by some technical 
terms and symbols) without explicit rules of deduction; and in almost all cases 
this holds for the metalanguage used in method (ii). This is customary for the 
formulations of deductions in mathematics and in science. Likewise in this 
book, we use method (ii); the proofs are formulated in the word language as 
our metalanguage (as an example, see the proof of T19-3). Thus, in all these 
cases, the method of examining the proofs has only the weaker and somewhat 
vague practical effectiveness described above. 


b. Inductive logic. Given: e, h, and r, and an alleged proof that c(h,e) = 
r; wanted: an answer to the question whether the alleged proof is correct. 
For instance, a physicist believes he has found a solution of a problem of 
the first kind, say, a suitable hypothesis + on the basis of an observational 
report e, and, moreover, a solution of the problem of the second kind for 
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this case, viz., what appears to him like a proof that c(#,e) = r; he wants 
to determine whether this is a correct proof. For the solution of this prob- 
lem, as for the analogous problem in deductive logic, there is a procedure 
which is at least practically effective. However, there is this difference: of 
the two methods (i) and (ii) earlier described, there is an analogue here 
only to the second, that is, a proof in the metalanguage for the semantical 
sentence ‘c(h,e) = 7’. No analogue to the first method is known; and it 
seems doubtful whether a simple and convenient method of this kind 
could be found. [One might perhaps think of a procedure consisting in the 
construction of a sequence of sentences, with a real number expression 
attached to each sentence expressing the c of that sentence on the fixed 
evidence e. The sentence e itself with ‘1’ attached to it would be the be- 
ginning of the sequence, and / with an expression for the number r at- 
tached to it would be the end. The sentences would belong to the object 
language, as in a proof in method (i), but the numerical expressions would 
still be in the metalanguage.] Thus the situation is here the same as de- 
scribed earlier for method (ii) in deductive logic. A proof is given, formu- 
lated in the word language, which serves as a semantical metalanguage; 
and we test the correctness of the proof by examining for each step 
whether it is valid on the basis of the tacitly presupposed standards. Thus 
the procedure is practically effective in just the same sense as explained 
earlier (although it is not effective in the strictest sense unless deductive 
rules are laid down for the metalanguage). 


B. The Relation between Deductive and Inductive Logic 


Deductive logic may be regarded as the theory of the L-concepts, 
especially L-implication. These concepts can be based on the semantical 
concept of range, as we have seen (§ 20). Thus deductive logic, in this 
sense, is seen to be a part of semantics, that part which we sometimes call 
L-semantics. Inductive logic, in its quantitative form, may be regarded 
as the theory of c. As we shall see later, c is also based on the concept of 
range, The theorems of inductive logic deal not only with c but also with 
L-implication and the other L-concepts. Thus, inductive logic is likewise 
a part of semantics; it presupposes deductive logic; it may be regarded as 
constructed out of deductive logic by the introduction of the definition 
for c. In a sense, we may say that the definition of L-implication repre- 
sents the rules of deduction; in the same sense, the definition of ¢ repre- 
sents the rules of induction. Except for this difference with respect to the 
definitions used, the procedures for constructing proofs for theorems are 
the same in inductive logic as in deductive logic. We have earlier spoken 
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of proofs for theorems of the form ‘e L-implies h’ in deductive logic (see 
IIa, method (ii)), and later of proofs for theorems of the form ‘c(i,e) = p 
in inductive logic (see IIIb). If we look not at the definitions used but at 
the forms of inference used in these two kinds of proof, we find that they 
are the same in both cases. Not only in proofs of theorems of deductive 
logic but also in those of inductive logic we apply the implicit deductive 
procedures which are customarily applied in the word language. Thus 
any procedure of proof in any field, also in inductive logic, is ultimately a 
deductive procedure. This does not mean, of course, that induction is a 
kind of deduction. We must clearly distinguish between theorems of in- 
ductive logic, e.g., ‘c(#,e) = 3/4’, and sentences like e and h about which 
the theorems speak. The former belong to the metalanguage; the latter 
belong to the object language and hence are not a part of inductive logic 
but its subject matter. The previous remark concerns only the former; 
it means that these theorems, although belonging to inductive logic, are 
reached by deduction. On the other hand, the relation between e and h, as 
stated by the theorem mentioned, is inductive, not deductive. No deduc- 
tive procedure leads from e to h; but, if we may say so, an inductive pro- 
cedure, characterized by the number 3/4, connects e with h. 

The far-reaching analogy which holds between inductive and deductive 
logic in spite of the important differences between these two fields were 
repeatedly emphasized in the preceding discussions. The principal com- 
mon characteristic of the statements in both fields is their independence 
of the contingency of facts, This characteristic justifies the application 
of the common term ‘logic’ to both fields. The following representation 
of examples in two parallel columns will perhaps help in further clarifying 
the analogy. 


Deductive Logic 
The subsequent statements in deduc- 
tive logic refer to these example sen- 
tences: 
Premise e: ‘All men are mortal, and 
Socrates is a man.’ 


Conclusion h: ‘Socrates is mortal.’ 


The following is an example of an ele- 
mentary statement in deductive logic: 
Dı. ‘e L-implies 4 (in £). 

(E is here either the English language or a 
semantical language system based on 
English.) 


Inductive Logic 


The subsequent statements in induc- 
tive logic refer to these example sentences: 


Evidence (or premise) e: ‘The number 
of inhabitants of Chicago is three million; 
two million of these have black hair; b is 
an inhabitant of Chicago.’ 

Hypothesis (or conclusion) h: ‘b has 
black hair.’ 

The following is an example of an ele- 
mentary statement in inductive logic: 


I. ‘c(he) = 2/3 (in E). 
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DEDUCTIVE Locic—Continued 


D.. The statement D, can be established 
by a logical analysis of the meanings of 
the sentences e and k, provided the defini- 
tion of ‘L-implication’ is given. 

D;. D: is a complete statement. We need 
not add to it any reference to specific de- 
ductive rules (e.g., the mood Barbara). 
However, the definition of ‘L-implication’ 
is, of course, presupposed for establish- 
ing D,. 


The following is a consequence of D3. 


D,. The question whether the premise e 
is known (well established, highly con- 
firmed, accepted), is irrelevant for D:. 
This question becomes relevant only in 
the application of D: (see De and D;). © 


D; follows from D;: 
Ds. ‘If e is true, then h is true.’ 


Ds and D; are consequences of D: con- 
cerning applications to possible knowl- 
edge situations. De represents the theo- 
retical application (that is, the result re- 
fers again to the knowledge situation); D7 
represents the practical application (that 
is, the result refers to a decision). 


De. ‘If e is known (accepted, well estab- 
lished) by the person X at the time ż, then 
h is likewise.’ (Here, ‘to know’ is under- 
stood in a wide sense, including not only 
items of X’s explicit knowledge, that is, 
those which he is able to declare explicit- 
ly, but also those which are implicitly 
contained in X’s explicit knowledge.] 


D,. ‘If e is known by X at #, then a de- 
cision of X at £ based on the assumption 
h is rationally justified.” 
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INDUCTIVE Locic—Continued 


Ia. The statement I, can be established by 
a logical analysis of the meanings of the 
sentences e and h, provided the definition 
of ‘degree of confirmation’ is given. 


I,. I, is a complete statement. We need 
not add to it any reference to specific in- 
ductive rules (e.g., for I, a rule of the di- 
rect inductive inference). However, the 
definition of ‘degree of confirmation’ is, of 
course, presupposed for establishing Ir. 


The following is a consequence of Ta, 


I,. The question whether the premise (evi- 
dence) e is known (well established, high- 
ly confirmed, accepted), is irrelevant for 
I.. This question becomes relevant only 
in the application of I, (see Is and I;). 


There is here no analogue to Ds. From 
I, and ʻe is true’ nothing can be inferred 
(see § 10A). 

Is and I, are consequences of I, con- 
cerning applications to possible knowledge 
situations. Is represents the theoretical 
application, Iņ, the practical application. 


Is. ‘If e and nothing else is known by X at 
t, then + is confirmed by X at ¢ to the de- 
gree 2/3.’ (Here, the term ‘confirmed’ 
does not mean the logical (semantical) 
concept of degree of confirmation occur- 
ring in D; but a corresponding pragmati- 
cal concept; the latter is, however, not 
identical with the concept of degree of 
(actual) belief but means rather the de- 
gree of belief justified by the observa- 
tional knowledge of X at ¢.] The phrase 
‘and nothing else’ in Is is essential; see 
§ 45B concerning the requirement of total 
evidence. 


I,. ‘If e and nothing else is known by 
X at t, then a decision of X at ¢ based on 
the assumption of the degree of certainty 
2/3 for h is rationally justified (e.g., the 
decision to accept a bet on k with a bet- 
ting quotient not higher than 2/3).’ 
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It should be noticed that in inductive logic, just as in deductive logic, 
the reference to the knowledge of X does not occur in the purely logical 
statements (e.g., I,) but only in the statements of application (Is and I,). 
It is true that statements of inductive logic, like those of deductive logic, 
are usually applied both in everyday life and in science to a premise or 
evidence that is known, i.e., well established by observations. Neverthe- 
less, it is irrelevant for the validity as distinguished from the practical 
value or applicability, of a statement of inductive logic, just as for one 
of deductive logic, whether the evidence is true or not and, if it is true, 
whether its truth is known or not. 

We shall later (§ 55B) clarify the relation between deductive and induc- 
tive logic in still another way with the help of the concept of range. We 
shall see that a statement of deductive logic like ‘e L-implies X’ means that 
the entire range of e is included in that of 4, while a statement of induc- 
tive logic like ‘c(h,e) = 3/4’ means that three-fourths of the range of e is 
included in that of }. This shows again the similarity and at the same time 
the difference between the two fields. 


§ 44. Logical and Methodological Problems 


A. With respect to deductive procedures, we distinguish between the prob- 
lems of deductive logic proper, including mathematics, and those of the meth- 
odology of deduction. The latter concern the choice of suitable deductive pro- 
cedures for given purposes. Analogously we distinguish between inductive logic 
and methodology of induction. The latter gives no exact rules but only advice 
how best to apply inductive procedures for given purposes. Bacon’s and Mill’s 
theories on induction belong chiefly’ not to inductive logic, but to the method- 
ology of induction. On the other hand, the beginnings of an inductive logic are 
found in the classical theory of probability. 

B. An inductive inference does not, like a deductive inference, lead to the ac- 
quisition of a new sentence but rather to the determination of a degree of con- 
firmation. Inductive inferences usually concern a population (of persons or 
things) and samples; in many cases they deal with frequencies (statistical in- 
ferences). The principal kinds of inductive inference are briefly characterized: 
(1) direct inference, (2) predictive inference, (3) inference by analogy, (4) in- 
verse inference, (5) universal inference. 


A. Methodological Problems 


In order to clarify the aim of our construction of inductive logic, it 
seems useful to emphasize a certain distinction between two kinds of prob- 
lems. The problems of the one kind constitute the field which we call induc- 
tive logic; the problems of the other kind may be called, for lack of a better 
term, methodological problems and, more specifically, problems of the 
methodology of induction. Before explaining this distinction, let us look at 
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deductive logic, where an analogous distinction can be made which is easier 
to understand. Here we have first the field of deductive logic proper, 
including pure mathematics. To this field belong, for instance, the theo- 
rems stated in §§ 20-40 above. Then there is a second field, closely con- 
nected but not identical with deductive logic. In this second field, methods 
are described for practically carrying out the procedures of deductive logic 
and mathematics, and suggestions are made for the use of these methods 
in various situations and for various purposes. Here we learn, for instance, 
how best to look for a proof of a conjectured theorem or for a simplification 
of a given proof; some hints are given as to the conditions under which an 
indirect proof may be useful; devices are explained for proving the inde- 
pendence of a certain sentence from a given set of postulates, or the con- 
sistency of the set, or its completeness; other devices are given for finding 
convenient approximating functions for the purpose of numerical calcula- 
tions (for example, T40-4 above; this theorem itself and other similar ones 
in § 40A belong to mathematics and hence to deductive logic; but the 
more or less vague general rules which tell us how to find an approximating 
function of this kind when we need it belong to the second field). This sec- 
ond field may be called methodology of deductive logic and mathematics. 

Analogously, inductive logic (in its quantitative form) contains state- 
ments which attribute a certain value of c to a certain case, that is, a pair 
of sentences e,h, or speak about relations between values of ¢ in different 
cases. On the other hand, the methodology of induction gives advice how 
best to apply the methods of inductive logic for certain purposes. We may, 
for instance, wish to test a given hypothesis h; methodology tells us which 
kinds of experiments will be useful for this purpose by yielding observa- 
tional data e, which, if added to our previous knowledge e;, will be induc- 
tively highly relevant for our hypothesis /, that is, such that c(h,¢: » e2) 
is either considerably higher or considerably lower than c(h,e:). Sometimes, 
not one hypothesis but a set of competitive hypotheses is given, and we 
wish to come to an inductive decision among them by finding observa- 
tional material which gives to one of the hypotheses a considerably higher 
¢ than to the others. In another case, we may have found observational 
results which are not explainable by the hypotheses accepted so far and 
perhaps even incompatible with one of them; here, we wish to find a new 
hypothesis which not only is compatible with the observations but ex- 
plains them as well as possible. As explained in the preceding section 
(problem I), there is no effective procedure leading to this aim, no more 
than there is in mathematics for finding a theorem suitable for a given 
purpose. Nevertheless, it is possible in both cases to give some useful 
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hints in which direction and by which means to look for a result of the 
kind wanted; these hints are given by methodology. Inductive and deduc- 
tive logic cannot give them; they are indifferent to our needs and purposes 
both in practical life and in theoretical work. By emphasizing the distinc- 
tion between logic and methodology, we do not intend to advocate a sepa- 
ration of the two kinds of problems within scientific inquiry. They are 
usually treated in close connection, and that is very useful. There is hardly 
any book in mathematics—except perhaps a table of logarithms—that 
does not add to the mathematical theorems some indications as to how they 
may usefully be applied either in mathematics itself or in empirical sci- 
ence. Similarly, to our later theorems in inductive logic, we shall often 
add some remarks about their use. Some of these remarks concern the use 
within inductive logic, for instance, the utilization of a given theorem in 
proofs of later theorems; other remarks concern the use outside of induc- 
tive logic, for instance, the possibility of a practical application either of 
inductive logic in general or of a given theorem to knowledge situations. 
Remarks of both kinds belong, not to inductive logic itself, but to the 
methodology of induction. [Examples of methodological discussions con- 
cerning the application of inductive logic in general are our discussions 
of the requirements of logical independence and completeness (above, 
§ 18B), of the requirement of total evidence (below, § 45B), and the de- 
tailed discussions of the application of inductive logic for determining 
practical decisions (below, §§ 49-51); examples of methodological re- 
marks concerning the application of particular theorems to possible 
knowledge situations are found at many places in the subsequent chapters, 
` €g., in §§ 60, 61, and generally whenever in the comments on given theo- 
tems terms like ‘observation’, ‘known’, ‘unknown’, ‘expectation’, ‘pre- 
diction’, ‘decision’, ‘betting’, and similar ones occur.] However, the prin- 
cipal purpose of this book is the discussion and, if possible, solution of 
problems of inductive logic itself; in other words, the proof of theorems 
on the degree of confirmation. The discussions of problems of the method- 
ology of induction, on the other hand, are only incidental, although for 
practical reasons they may be useful and sometimes even indispensable. 
A theoretical book on geometry need not discuss in detail, if at all, the 
application of geometrical theorems for the calculation of the area of a 
garden or the distance of the moon, because the reader can be expected to 
be familiar with the connection between theoretical geometry and its ap- 
plication to spatial relations of physical bodies. In the case of inductive 
logic, on the other hand, there is at the present time not yet sufficient 
clarity and agreement even among the writers in the field concerning the 
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nature of the theory and the connection between theory and practical 
application. Therefore today a book on inductive logic is compelled to 
devote a considerable part of its space to a discussion of methodological 
problems. 

One of the purposes in emphasizing the distinction between inductive 
logic proper and the methodology of induction is to make it clear that 
certain books, investigations, and discussions concerning induction do not 
belong to inductive logic although they are often attributed to it. This 
holds in particular for the works of Francis Bacon and John Stuart Mill; 
their discussions on induction, including Mill’s methods of agreement, 
difference, etc., belong chiefly to the methodology of induction and give 
hardly a beginning of inductive logic. On the other hand, the beginnings 
of a systematic inductive logic can be found in another class of works, 
some of them written a long time before Mill, although in many of these 
works the word ‘induction’ does not even occur. I am referring to all 
those works which deal with the theory of probability,; as previously ex- 
plained (§ 12), most of the classical works on the theory of probability 
belong to this class, as do most of those modern books on probability which 
are not based on the frequency conception of probability. In most of 
these theories, probability has numerical values; hence, they are systems 
of quantitative inductive logic. Keynes's theory is an example of a com- 
parative inductive logic supplemented by a very restricted part of quan- 
titative inductive logic, since, according to his conception, probability has 


numerical values only in some cases of a special kind, while in general only 
ult that one hypothesis is more 


a comparison is possible leading to the res 
probable than another. Jeffreys starts with axioms on the primitive notion 
‘given p, q is more probable than r’, hence with a comparative inductive 
logic; on its basis, a quantitative inductive logic is constructed by laying 
down conventions for the assignment of numerical values. 


B. Inductive Inferences 


What we call inductive logic is often called the theory of nondemonstra- 
tive or nondeductive inference. Since we use the term ‘inductive’ in the 
wide sense of ‘nondeductive’, we might call it the theory of inductive in- 
ference. We shall indeed often speak of inductive inferences because the 
term is customary and convenient. However, it should be noticed that the 
term ‘inference’ must here, in inductive logic, not be understood in the 
same sense as in deductive logic. Deductive and inductive logic are analo- 
gous in one respect: both investigate logical relations between sentences; 
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the first studies the relation of L-implication, the second that of degree of 
confirmation which may be regarded as a numerical measure for a partial 
L-implication, as we shall see (§ 55B). The term ‘inference’ in its custom- 
ary use implies a transition from given sentences to new sentences or an 
acquisition of a new sentence on the basis of sentences already possessed. 
However, only deductive inference is inference in this sense. If an observer 
` X has written down a list of sentences stating facts which he knows, then 
he may add to the list any other sentence which he finds to be L-implied 
by sentences of his list. If, on the other hand, he finds that his knowledge 
confirms another sentence to a certain degree, he must not simply add this 
other sentence. The result of his inductive examination cannot be formu- 
lated by the sentence alone; the value found for the degree of confirma- 
tion is an essential part of the result. If we want to give a schematized 
(and hence somewhat oversimplified) picture of X’s procedure, we may 
imagine that he writes two lists of sentences; for the sake of simplicity 
we assume that the sentences of both lists are molecular. The first list 
contains the sentences which he knows; additions to this list are made in 
two ways: (a) basic sentences formulating the results of new observations 
which he makes and (b) sentences L-implied by those on the list. Only 
the additions of the kind (a) change the logical content of the list. Let us 
assume that the atomic sentences of X’s language are logically independent 
of each other (according to the requirement of independence, § 18B). 
Then X need never cross out a sentence once written on the first list. The 
second list contains inductive results. These are formulated by sentences, 
each of them marked with a numerical value, its degree of confirmation 
with respect to the first list. These values, however, hold only for a certain 
time; as soon as a new observation sentence is added to the first list, the 
numerical values on the second list have to be revised. These values could 
be provided by an inductive machine, into which the observation sen- 
tences of the first list, kind (a), are fed. (In order to make the procedure 
effective and accessible to a machine, it must be restricted to a finite 
system.) 

This picture makes it clear that an inductive inference does not, like a 
deductive inference, result in the acquisition of a sentence but in the de- 
termination of its degree of confirmation. It is in this sense, and only in 
this sense, that we shall use the term ‘inductive inference’ further on. 

The most important kinds of inductive inference or, in other words, of 
general theorems concerning c deal with cases where either or both of the 
sentences e and / give information about frequencies, for instance, in the 
form of an individual or statistical distribution (§ 26B) for some indi- 


§ 44. LOGICAL AND METHODOLOGICAL PROBLEMS 207 


viduals with respect to a division. In these cases we might speak of sta- 
tistical inductive inferences. 

Following the usage of statisticians, we call the class of all those indi- 
viduals to which a given statistical investigation refers the population. 
Any proper subclass of the population, defined by an enumeration of its 
elements, not by a common property, is called a sample from the popula- 
tion. The population need not necessarily consist of human beings; it may 
consist of things or events of any kind, persons, animals, births, deaths, 
molecules, electrons, specimens of grain, products of d factory, etc. The 
population is usually not the whole universe of individuals but only a part 
of it. For example, the universe may be the totality of physical things; one 
investigation may take as population the present inhabitants of Chicago, 
another may take the inhabitants of Boston in 1900, etc.; the fact that 
these and other populations are parts of the same universe of individuals 
makes it possible first to formulate these investigations in the same lan- 
guage system and also, if desired, to consider later a more comprehensive 
population containing the original ones as parts and studying their rela- 
tions. 

We shall now briefly characterize some of the most important kinds of 
inductive inference; they are neither exhaustive nor mutually exclusive. 

1. The direct inference, that is, the inference from the population to a 
sample. (It might also be called internal inference or downward in ference.) 
e may state the frequency of a property M in the population, and / the 
same in a sample of the population. 

2. The predictive inference, that is, the inference from one sample to an- 
other sample not overlapping with the first. (It might also be called exter- 
nal inference.) This is the most important and fundamental inductive in- 
ference. From the general theorems concerning this kind we shall later (in 
Vol. II) derive the theorems concerning the subsequent kinds. The special 
case where the second sample consists of only one individual is called 
the singular predictive inference. We have indicated earlier (§ 41D) and 
we shall show in detail later (Tro8-r) that the results of the singular pre- 
dictive inference stand in a close relation to the estimation of relative 
frequency. 

3. The inference by analogy, the inference from one individual to an- 
other on the basis of their known similarity. 

4. The inverse inference, the inference from a sample to the population. 
(It might also be called upward inference.) This inference is of greater im- 
portance in practical statistical work than the direct inference because we 
usually have statistical information only for some samples actually ob- 
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served and counted and not for the whole population. Methods for the in- 
verse inference (often called ‘inverse probability’) have been much dis- 
cussed both in the classical period and in modern statistics. One of the 
chief stimulations for the developments of modern statistical methods 
came from the controversies concerning the validity of the classical meth- 
ods for the inverse inference. 

5. The universal inference, the inference from a sample to a hypothesis 
of universal form. This inference has often been regarded as the most im- 
portant kind of inductive inference. The term ‘induction’ was in the past 
often restricted to universal induction. Our later discussion will show that 
actually the predictive inference is more important not only from the 
point of view of practical decisions but also from that of theoretical 
science. 


§ 45. Abstraction in Inductive Logic 


A. The application of logic, which is not a task of logic itself but of methodol- 
ogy, has to do with states of observing, believing, knowing, and the like. On 
the other hand, logic itself, both deductive and inductive, deals not with these 
states but instead with sentences subject to exact rules. Thus logic gains exact- 
ness by abstracting from the vague features of actual situations, B. In the ap- 
plication of inductive logic still another difficulty is involved, which does not 
concern inductive logic itself. This difficulty consists in the fact that, if an ob- 
Server wants to apply inductive logic to an expectation concerning a hypothesis 
h, he has to take as evidence e a complete report of all his observational knowl- 
edge. Many authors on probability, have not given sufficient attention to this 
requirement of total evidence. They often leave aside a great part of the available 
information as though it were irrelevant, However, cases of strict irrelevance 
are much more rare than is usually assumed. C. The simple structure of our 
language systems, the earlier Tequirement of completeness (§ 18B), and now 
the requirement of total evidence compel us to construct all examples of the 
application of inductive logic in a fictitious simplified form. This fact, however, 
does not prevent the approximative application of inductive logic to actual 
knowledge situations in our actual world, just as certain idealized concepts of 
physics can be practically applied. D. Abstractions may be very fruitful and 
even necessary for the progress of science, as the example of geometry shows. 
Some students reject all abstractions; others use them excessively and neglect 
certain features of reality. These extremes are harmful. We should rather com- 
bine both tendencies, that emphasizing the concrete as well as that emphasizing 
the abstract. As to inductive logic, we should overlook neither the fact that its 
ultimate Purpose lies in its application in practical life nor the fact that it cannot 
be efficient without using abstract methods, 


A. Abstraction in Deductive and Inductive Logic 


Our theory of inductive logic will be applied not to the whole language 
of science with its great complexities, its large variety of forms of expres- 
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sion, and its variables of higher levels (e.g., for real numbers), but only 
to the simple language systems £ explained in the preceding chapter. This 
involves a certain simplification and schematization of inductive proce- 
dures in comparison with those actually used in the practice of science. 
Other kinds of schematization here involved are still more important; 
they will be discussed in this section. The first of them is inherent in any 
logical method; it could not be avoided even if we took the whole language 
of science as our object language, and it is a necessary factor even in de- 
ductive logic. It consists in the fact that the pure systems of both deduc- 
tive and inductive logic refer simply to sentences (or to the propositions 
expressed by them) rather than to states of knowing, believing, assuming, 
etc., while any application of logic to an actual situation has to do with 
these states. This application is outside of pure logic itself; it belongs to 
the subject matter of the methodology of logic, as we have seen in the 
preceding section. 

Let us first take an example from deductive logic. One of the simplest 
theorems of deductive logic says that i L-implies 7 V j. One kind of appli- 
cation of this theorem consists in the following rule, which is not a logical 
but a methodological rule: if X has good reasons for believing 7, then the 
same reasons entitle him to believe i Vj. This, however, is a crude for- 
mulation using ‘believing’ as a classificatory concept. A more adequate 
formulation would use it as a quantitative or at least as a comparative 
concept: if X at the time ¢ has reasons for a belief in ¢ to the degree r, then 
he has at the same time reasons for a belief in i V j at least to the degree r. 
For instance, I look at a tree and, on the basis of what I see, I am con- 
vinced that a certain leaf is green; then I have the right to be convinced 
at least as strongly that this leaf is green or smooth. In this way, some 
rather vague and perhaps even problematic concepts enter the situation. 
Am I actually convinced? How am I to measure the strength of my con- 
viction or at least to compare two convictions as to their strength? Is the 
color I want to express described accurately by ‘green’, or should I per- 
haps rather say ‘greenish-blue’? We have here all the vaguenesses and 
other difficulties which arise on the way from an observation to the utter- 
ance of a corresponding observation sentence and our report about the 
belief in it. Within logic, however, all these difficulties do not appear. Not 
that they have been overcome; we just leave them outside, we ‘abstract’ 
from them. The advantage of this procedure is that in logic we deal only 
with clear-cut entities without vagueness. We have predicates and they 
are assumed to designate properties, and further we have other signs and 
their designata. The actual vagueness of the boundary line between green 


210 IV. THE PROBLEM OF INDUCTIVE LOGIC 


and blue is disregarded and likewise the vagueness of the other properties 
and all other designata. Furthermore, logic contains other semantical 
rules determining the meaning of the sentences on the basis of these 
designata (e.g., in the form of rules of ranges, as explained in § 18D). 
With the help of these rules, we determine whether or not the relation of 
L-implication holds between given sentences, and thus we reach one of the 
chief aims of deductive logic. (For instance, we show that 7 L-implies 
i V j by showing that the range of 7 is contained in that of i Vj.) All these 
procedures within deductive logic deal with neat, clear-cut entities ac- 
cording to exact rules and thus are not blurred by any vagueness. How- 
ever, we must necessarily pay a price for this advantage; by the abstrac- 
tion which we carry out in order to construct our system of logic, we dis- 
regard certain features; they remain outside the scope of logic. However, 
we must be careful in the characterization of this situation. Some philoso- 
phers say that, in consequence of the abstraction leading to logic or, in a 
similar way, to quantitative physics, certain features of reality (for in- 
stance, the ‘genuine qualities’ or ‘qualia’) remain forever outside our 
grasp. I do not agree with this view; although it sounds similar to what I 
said earlier, there is a fundamental difference. This may become clearer by 
the following analogy. Suppose a circular area is given, and we want to 
cover some of it with quadrangles which we draw within the circle and 
which do not overlap. This can be done in many different ways; but, 
whichever way we do it and however far we go with the (finite) procedure, 
we shall never succeed in covering the whole circular area. However, it is 
not true that—in analogy to the philosophical view mentioned—there is 
any point in the area which cannot be covered. On the contrary, for every 
point and even for every finite number of points there is a finite set of 
quadrangles covering all of them. The situation with abstraction is analo- 
gous. In any construction of a system of logic or, in other words, of a lan- 
guage system with exact rules, something is sacrificed, is not grasped, be- 
cause of the abstraction or schematization involved. However, it is not 
true that there is anything that cannot be grasped by a language system 
and hence escapes logic. For any single fact in the world, a language sys- 
tem can be constructed which is capable of representing that fact while 
others are not covered. For instance, if we find ourselves unable to describe 
a certain subtle difference between two shades of color with simple predi- 
cates like ‘green’ and ‘blue’, we may make our net finer and finer by intro- 
ducing more and more predicates like ‘bluish-green’, ‘greenish-blue’, etc., 
or by introducing quantitative scales (as in the color systems of W. Ost- 
wald or A. C. Hardy); in this way, our language becomes more and more 
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precise with respect to colors. Perhaps this process of introducing more 
and more precise terms can never come to an end, so that some vagueness 
always remains. On the other hand, there is no difference in color shade, 
however slight, that remains forever inexpressible. 


B. The Requirement of Total Evidence 


Suppose that inductive logic supplies a simple result of the form 
‘c(h,e) = r’, where k and e are two given sentences and r is a given real 
number. How is this result to be applied to a given knowledge situation? 
This question is answered by the following rule, which is not a rule of in- 
ductive logic but of the methodology of induction: 


(1) If e expresses the total knowledge of X at the time #, that is to say, 
his total knowledge of the results of his observations, then X is 
justified at this time to believe + to the degree r, and hence to bet 
on kh with a betting quotient not higher than r. 


One of the decisive points in this rule is the fact that it lays down the 
following stipulation: 


(2) Requirement of total evidence: in the application of inductive logic to 
a given knowledge situation, the total evidence available must be 
taken as basis for determining the degree of confirmation. 


There is no analogue to this requirement in deductive logic. If deductive 
logic says that e L-implies / and if X knows e, then he is entitled to assert %4 
irrespective of any further knowledge he may possess. On the other hand, 
if inductive logic says that c(/,e) = r, then the mere fact that X knows e 
does not entitle him to believe 4 to the degree r; obviously it is required 
either that X know nothing beyond e or that the totality of his additional 
knowledge i be irrelevant for h with respect to e, i.e., that it can be shown 
in inductive logic that c(#,e +7) = c(h,e). It cannot even be said that X 
may believe / at least to the degree r; by the addition of i, the c for k may 
as well decrease as increase. The theoretical validity of the requirement of 
total evidence cannot be doubted. If a judge in determining the proba- 
bility of the defendant’s guilt were to disregard some relevant facts 
brought to his knowledge; if a businessman tried to estimate the gain to be 
expected from a certain deal but left out of consideration some risks he 
knows to be involved; or if a scientist pleading for a certain hypothesis 
omitted in his publication some experimental results unfavorable to the 
hypothesis, then everybody would regard such a procedure as wrong. 

The requirement has been recognized since the classical period of the 
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theory of probability. Keynes ([Probab.], p. 313) refers to “Bernoulli’s 
maxim, that in reckoning a probability, we must take into account all the 
information which we have”. Although in the second axiom referred to 
by Keynes, Bernoulli speaks in somewhat weaker terms (“everything 
that can come to our knowledge” [Ars], p. 214), the formulation of the 
third axiom (“Not only those arguments must be considered which are 
favorable to an affair but also all those which can be advanced against 
it, so that after pondering both it becomes clear which ones outweigh the 
others”, p. 215) and the examples given in connection with both axioms 
leave no doubt that the requirement of total evidence is meant. The re- 
quirement is expressed more clearly by C. S. Peirce: “I cannot make a 
valid probable inference without taking into account whatever knowledge 
Ihave . . . that bears on the question” ([Theory], p. 461). However, many 
writers since the classical period, although presumably acknowledging the 
requirement in theory, did not give sufficient attention to it in questions 
of practical application. Laplace himself, for instance, raised the following 
question: According to the reports of history, the sun has never failed to 
rise every twenty-four hours for five thousand years or 1,826,213 days; 
what is the probability of its rising again tomorrow morning? Using his 
rule of succession, Laplace gave the answer: 1 — 1/1,826,215. Since we 
cannot assume that he was unaware of the fact that history reports be- 
sides sunrises also a number of other events, we must conclude that he 
either regarded all other known events as irrelevant for his problem or 
failed to consider the question of relevance. Many examples of a similar 
nature were constructed. Later writers criticized these examples. Aside 
from criticisms of the methods used for the solutions, for example, the 
tule of succession, the objection was raised that series of events of this 
kind are not a proper subject matter for the theory of probability because 
we have a causal explanation for them and therefore cannot regard them 
as matters of chance. I should prefer to give this objection a different form. 
I agree with Laplace against his critics in the view that the theory of 
probability or inductive logic applies to all kinds of events, including those 
which seem to follow so-called causal laws, that is, general formulas of 
physics, for instance, in the example of the sun, the laws of mechanics ap- 
plied to the earth and the sun. On the other hand, I agree with the critical 
judgment of the later writers that Laplace’s application of the theory in 
cases of this kind is not correct because our knowledge of mechanics is 
disregarded. I would say that the requirement of total evidence is here 
violated because there are many other known facts which are relevant 
for the probability of the sun’s rising tomorrow. Among them are all those 
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facts which function as confirming instances for the laws of methanics. 
They are relevant because the prediction of the sunrise for tomorrow isa 
prediction of an instance of these laws. 

Modern authors on probability are in general more careful in the con- 
struction of their examples; but I think that even they are often not cau- 
tious enough in their tacit or explicit assumptions as to irrelevance. The 
cases of strict irrelevance are considerably more rare than is usually be- 
lieved. Later, in the construction of our system of inductive logic, ex- 
amples will be found where we might be inclined at the first look to assume 
irrelevance, while a closer investigation shows that it does not hold. 


C. The Applicability of Inductive Logic 


We have seen earlier (§ 18B) that the requirement of completeness com- 
pels us to imagine for the purpose of the application of inductive logic a 
simplified world, a universe which is not more complex in structure or 
more abundant in variety than the simple language system which we are 
able to manipulate in inductive logic. Now the requirement of total evi- 
dence compels us in the construction of examples of application to imagine 
in the simplified universe an observer X with a simplified biography. 
While every adult person in our actual world has observed an enormous 
number of events with an immense variety transcending all possibilities 
of complete description, let alone calculatory inductive analysis, we have 
to imagine an observer X whose entire wealth of experience is so limited 
that it can easily be formulated and taken as a basis for inductive pro- 
cedures, Thus, examples of the application of inductive logic must neces- 
sarily show certain fictitious features and deviate more from situations 
which can actually occur than is the case in deductive logic. This fact, 
however, does not make inductive logic a fictitious theory without rele- 
vance for science or practical life. A man who wants to calculate the areas 
of islands and countries begins with studying geometrical theorems illus- 
trated by examples of simple forms like triangles, rectangles, circles, etc., 
although none of the countries in which he is interested has any of these 
forms. He knows that by beginning with simple forms he will learn a 
method which can be applied also to more and more complex forms ap- 
proximating more and more the areas in which he is interested. Analo- 
gously, the method of inductive logic, although first applied only to ficti- 
tious simple situations, can, if sufficiently developed, be applied to more 
and more complex cases which approximate more and more the situations 
in which we find'ourselves in real life. Physics likewise uses certain simpli- 
fied, idealized conceptions which would hold strictly only in a fictitious 
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universe, for example, those of frictionless movement, an absolutely rigid 
lever, a perfect pendulum, a mass point, an ideal gas, etc. These concepts 
are found to be useful, however, because the simple laws stated for these 
ideal cases hold approximately whenever the ideal conditions are approxi- 
mately fulfilled. Similarly, there are actual situations which may be re- 
garded as approximately representing the ideal conditions dealt with in 
our inductive logic referring to the simple systems &. 

` Suppose, for instance, that spherical balls of equal size are drawn from 
an urn; the surface of these balls is in general white, but some are marked 
with a red point, others not; some (without regard to whether they have a 
red point or not) have a blue point, others not; and some have a yellow 
point, others not. A simple inspection does not reveal other differences 
between the balls. Then we may apply our system & to the balls and their 
observed marks; we take as individuals the balls, or rather the events of 
the appearance of the single balls, abstracting from the fact that the actual 
balls have distinguishable parts and that the very markings by which we 
distinguish them are parts of the balls. And we take the three kinds of 
markings as primitive properties as though they were the only qualitative 
properties of the balls, abstracting from the fact that a careful inspection 
of the actual balls would reveal many more properties in which they differ. 
Suppose we have drawn one hundred balls and found that forty of them 
had the property M of bearing a red point and a blue point. Suppose that 
this is all the knowledge we have concerning the balls and that we are in- 
terested in the probability of the hypothesis + that the next ball (if and 
when it appears) will have the property M. Then we shall take as our 
evidence e the observation results concerning the hundred balls just de- 
scribed. This is again an idealization of the actual situation because in 
fact we have, of course, an enormous amount of knowledge concerning 
other things. We leave this other knowledge 7 aside because we regard it 
as plausible that it is not very relevant for h with respect to é, that is to 
say, that the value of c(z,e), which we can calculate, does not differ much 
from the value of c(h,e. i), which ought to be taken according to the re- 
quirement of total evidence but which would make the calculation too 
complicated. (Of course, we may be mistaken in the assumption of the 
near-irrelevance of i; that is to say, a closer investigation might show that, 
in order to come to a sufficient approximation, certain other parts of the 
available knowledge must be included in the evidence; just as a physicist 
who assumes that the influence of the friction in a certain case is so small 
that he may neglect it may find by a closer analysis that its influence is 
considerable and therefore must be taken into account.) If the temporal 
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order of the hundred ball drawings is known and seems to be relevant (for 
instance, if the sequence of the colors in their temporal order of appearance 
shows a high degree of regularity), then we shall include in our evidence 
the description of this order according to one of the methods earlier ex- 
plained (§ 15B). If the temporal order of the hundred drawings is not 
known (for instance, if we counted only the number of each kind without 
paying attention to the order) or if it is known but assumed to be not very 
relevant, then we shall take as evidence the conjunction of three hundred 
sentences, each of which says of one of the hundred balls whether or not it 
has one of the three primitive properties. It will even be sufficient to take 
as evidence a conjunction of one hundred sentences, each of which says of 
one of the hundred balls whether or not it is M. For certain rules of induc- 
tion or definitions of degree of confirmation, it can be shown that the 
additional knowledge contained in the three hundred sentences is strictly 
irrelevant in this case. 

Let us suppose that we have decided to take the latter conjunction of 
one hundred sentences concerning M and non-M as our evidence e. Then 
a system of inductive logic, although formulated for a simplified universe, 
may be applied to the actual knowledge situation just described. The ap- 
plication consists in calculating the value of the degree of confirmation ¢ 
for the hypothesis # and the evidence e specified and taking this value as 
the probability sought. 

It is important to recognize clearly the nature of the difficulties which 
have just been explained. They do not occur in inductive logic itself but 
only in the application of inductive logic to actual situations of knowl- 
edge; hence they belong to the methodology of induction. Like deductive 
logic, inductive logic has to do only with clear-cut entities without any 
vagueness; it deals with sentences of a constructed language system; it 
ascribes to a pair of sentences h,e a real number r as the degree of con- 
firmation according to exact rules. Here, as in deductive logic, the exact- 
ness, the freedom from vagueness, is obtained by abstraction and there- 


fore at a sacrifice. 


D. Dangers and Usefulness of Abstraction 

Some scientists and philosophers feel a strong disinclination against all 
abstractions or schematizations. They demand that any methodological 
or even logical analysis of science should never lose sight of the actual be- 
havior of scientists both in the laboratory and at the desk. They warn 
against neglecting any of the factors which a good scientist takes into con- 
sideration in inventing and testing his hypotheses; they emphasize that 
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the complex judgment on the acceptability of a hypothesis cannot be 
based on just one number, the degree of confirmation. I think that this 
view contains a correct and important idea. Whenever we make an ab- 
straction, we certainly ought to be fully aware of what we are doing and 
riot to forget that we leave aside certain features of the real processes and 
that these features from which we abstract at the moment must not 
be entirely overlooked but must be given their rightful place at some point 
in the full investigation of science. On the other hand, if some authors 
exaggerate this valid requirement into a wholesale rejection of all ab- 
stractions and schematizations, an attitude which sometimes develops 
into a veritable abstractophobia, then they deprive science of some of its 
most fruitful methods. 

The history of science is full of examples for the usefulness and immense 
fertility of abstractions. One of the most outstanding examples is geome- 
try. It was created by an act of abstraction: attention was directed toward 
the spatial properties and relations of bodies, while all other properties, 
color, substance, weight, etc., were disregarded. Then another bold step 
was taken, leading away from the world of concrete things with their di- 
rectly observable properties to a schema consisting of constructs: geome- 
try was transformed into a theory of certain spatial configurations whose 
properties are completely and ‘exactly determined. This geometry no 
longer deals with wooden or iron balls but with spheres, perfect spheres 
of which the balls are only more or less rough approximations. It deals 
with infinite straight lines, of which at best some finite segments are 
approximately represented by certain threads and edges of bodies. Both 
these steps of abstraction were taken in ancient times; we will not 
discuss here some later steps which went even much farther in the same 
direction by transforming geometry into a theory of certain sets of real 
numbers (Descartes), into a formal axiom system (Hilbert), and finally 
into a special branch of the logic of relations (Russell). The important 
point for our discussion appears already in the effect of the first two steps 
of abstraction. Today it is clear that the magnificent development of 
geometry through its history of more than two thousand years would 
have been impossible without those abstractions and that the develop- 
ment of physics would have been impossible without that of geometry. 
Thus the end result is that, not only from the point of view of the mathe- 
matician but also from that of the physicist, the abstractions in geometry 
are immensely useful and even practically indispensable. Although the aim 
remains the investigation not of the abstract configurations but of the 
observable spatial properties of concrete things, nevertheless it turns out 
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that abstract geometry supplies the most efficient method for this investi- 
gation, much more efficient than any method dealing directly with ob- 
servable spatial properties. Numerous other methods of abstraction or 
schematization have proved fruitful in physics. This shows that, if we 
want to obtain knowledge of the things and events of our environment as 
a help for our decisions in practical life, then the roundabout way which 
leads first away from these things to an abstract schema may in the long 
run be better than the direct way which stays close to the things and their 
observable properties. 

The situation in logic is analogous. Both in deductive and in inductive 
logic we deal with abstract schemata, with sentences which belong to 
constructed language systems and are manipulated according to exact 
rules. This is admittedly a step away from the actual situations of observ- 
ing, believing, etc., in which we find ourselves in practical life. The choice 
of this procedure is not based on the assumption that the actual situa- 
tions are unimportant and that the exact schemata are all that matters. 
On the contrary, the final aim of the whole enterprise of logic as of any 
other cognitive endeavor is to supply methods for guiding our decisions 
in practical situations. (This does, of course, not mean that this final aim 
is also the motive in every activity in logic or science.) But here, as in 
physics, the roundabout way through an abstract schema is the best way 
also for the practical aim. Some philosophers who shy away from all 
abstractions have suggested that in the logical analysis of science we 
should not make abstractions but deal with the actual procedures, ob- 
servations, statements, etc., made by scientists; we should give up the 
concept of truth as defined in pure semantics with respect to a constructed 
language system and use instead the pragmatical concept ‘accepted (or 
verified or highly confirmed) by X at the time ?’; likewise, instead of the 
semantical concept of L-truth (see § 20), we should use a related prag- 
matical concept defined in about this way: “ is a sentence of such a kind 
that, for any sentence j, the utterance of the conjunction 7.7 by X to Y 
has the same effect on F as the utterance of j alone’. A theory of prag- 
matical concepts would certainly be of interest, and a further develop- 
ment of such a theory from the present modest beginnings is highly de- 
sirable. However, I think the repudiation of pure radical semantics and 
L-semantics, and thereby both of pure deductive and of inductive logic, 
in favor of a merely pragmatical analysis of the language of science would 
lead to a method of very poor efficiency, analogous to a geometry re- 
stricted to observable spatial properties. Inductive logic deals with 
schemata; but it is developed not for the sake of these schemata, but 
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finally for the purpose of giving help to the man who wants to know how 
certain he can be that his crop will not be destroyed by a drought, to the 
insurance company which wants to calculate a premium rate for life in- 
surance that is not too high but still profitable, to the engineer who wants 
to find the degree of certainty that the bridge he constructs will be able to 
carry a certain load, to the physicist who wants to find out which of a set 
of competing theories is best supported by the experimental results known 
to him. The decisive point is that just for these practical applications the 
method which uses abstract schemata is the most efficient one. 

One of the factors contributing to the origin of the controversy about 
abstractions is a psychological one; it is the difference between two con- 
stitutional types. Persons of the one type (extroverts) are attentive to and 
have a liking for nature with all its complexities and its inexhaustible 
richness of qualities; consequently, they dislike to see any of these quali- 
ties overlooked or neglected in a description or a scientific theory. Persons 
of the other type (introverts) like the neatness and exactness of formal 
structures more than the richness of qualities; consequently, they are in- 
clined to replace in their thinking the full picture of reality by a simplified 
schema. In the field of science and of theoretical investigation in general, 
both types do valuable work; their functions complement each other, and 
both are indispensable. Students of the first type are the best observers; 
they call our attention to subtle and easily overlooked features of reality. 
They alone, however, would not be able to reach generalizations of a high 
level, because abstractions are needed for this purpose. Therefore, a science 
developed by them alone would be rich in details but weak in power of 
explanation and prediction. (This is a warning to those who are afraid of 
abstractions, especially in inductive logic.) Students of the second type 
are the best originators and users of abstract methods which, when suffi- 
ciently developed, may be applied as powerful instruments for the pur- 
Pose of description, explanation, and prediction. Their chief weakness is 
the ever present temptation to overschematize and oversimplify and hence 
to overlook important factors in the actual situation; the result may be a 
theory which is wonderful to look at in its exactness, symmetry, and formal 
elegance, and yet woefully inadequate for the task of application for which 
it is intended. (This is a warning directed at the author of this book by his 
critical super-ego.) 

It seems to me that the contrast between the two types, as long as its 
expression is a controversy between thesis and antithesis, the danger of 
abstractions versus their usefulness, is futile. It may become fruitful if 
expressed as a difference in emphasis rather than in assertion; either type 
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emphasizes one side of the whole method of research and works as a safe- 
guard against its neglect. History and personal experiences show us that 
either type is tempted to underestimate the value of the work of the 
other type. However, it is clear that.science can progress only by the co- 
operation of both types, by the combination of both directions in the 
working method. 

The foregoing distinction of two types is a customary but obviously 
oversimplified description of the situation. Instead of speaking of two 
types, one directed toward the concrete, the other toward the abstract, 
it would be more correct to apply a continuous scale of comparison: a per- 
son X tends less toward the concrete and more toward the abstract than 
another person F. (In other words, a comparative concept is here more 
adequate than the two classificatory concepts; see § 4.) 


§ 46. Is a Quantitative Inductive Logic Impossible? 


Some students regard a quantitative degree of confirmation and hence a 
quantitative inductive logic as impossible because there are very many differ- 
ent factors determining the choice of the “best” hypothesis, and some of them 
cannot be numerically evaluated. However, the task of inductive logic is not 
to represent all these factors, but only the logical ones; the methodological 
(practical, technological) and other nonlogical factors lie outside its scope. 
Some authors, among them Kries, believe (1) that even the logical factors, for 
example, the extension, precision, and variety of the confirming material, are 
in principle inaccessible to numerical evaluation; and (2) that it is impossible 
to define a quantitative degree of confirmation dependent upon these factors. 
The first of these assertions is easily refuted. 


The different attitude of the two psychological types discussed above 
manifested itself clearly each time in the development of modern sci- 
ence when attempts were made to introduce quantitative concepts, meas- 
urement, and mathematical methods into a new field, for instance, psy- 
chology, social sciences, and biology. Those who made these attempts 
were convinced from the beginning that the application of mathematical 
methods was possible though perhaps difficult. Even if they had to admit 
that the initial steps taken were far from perfect, they were not dis- 
couraged; they did not believe that these defects were necessary, due to 
an inherent nonquantitative character of the field in question. They ex- 
pected that the method could and would be improved and that, when 
further developed, it would yield many new results unobtainable by the 
traditional methods alone. The opponents, on the other hand, believed 
either that it was impossible in principle to apply quantitative concepts 
to the special field (“How should it be possible to measure an intensive 
magnitude like a degree of intelligence, the intensity of an emotion, the 
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similarity of two color sensations?”) or that the quantitative method 
would only furnish trivial results and could not contribute to the real un- 
derstanding of the phenomena, or even that the application of this method 
would do harm by giving a one-sided and distorted picture. The develop- 
ments in quite a number of fields have shown in the meantime that the 
proponents of quantitative methods were right in their basic idea; on the 
other hand, they would themselves admit today that certain features of 
the methods applied at the beginning were not adequate, that the method 
is in need of continuous correction and improvement, and also that it is 
advisable to keep always in mind which features of the events under in- 
vestigation are adequately represented by the quantitative concepts used 
and which are not. 

The attempt to construct a quantitative inductive logic is not quite 
analogous to the cases just discussed, since here we have to do with a field 
of logic, and there with fields of empirical science. Nevertheless, the psy- 
chological situation is similar. It is therefore not surprising that also in 
this case objections are raised against the use of the quantitative method. 
And it seems that here the opposition is even stronger than in other cases. 
Many philosophers and scientists who object neither to the abstractions 
generally involved in logic nor to the introduction of the quantitative 
method into other fields are skeptical about its application in inductive 
logic or even declare this application to be impossible. For example, Kries 
believes that numerical values of probability, are applicable only in situa- 
tions similar to those in games of chance, while in other cases at best a 
comparative statement is possible; he says, for instance, that an expecta- 
tion based upon an inference by analogy “is always only more or less 
probable. The logical relation which holds here has nothing that can be 
represented numerically” ((Prinzipien], P- 26). Likewise, Keynes thinks 
that probability, is measurable only in cases of a very special kind. More 
recently, Ernest Nagel has expressed serious doubts concerning the possi- 
bility of a quantitative concept of degree of confirmation ({Principles], 
pp. 68-71). He points out the various factors which a scientist takes into 
consideration in judging and then either accepting or rejecting a proposed 
theory on the basis of given observational evidence. He explains the diffi- 
culties involved in any attempt to take into account these factors; some 
of these difficulties will be discussed in the next section. Because of the 
multiplicity of the factors involved, Nagel doubts whether it is possible to 
arrange theories in a linear order of increasing confirmation on a given 
evidence; if this is impossible, a quantitative degree of confirmation is 
obviously impossible. 
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Quantitative inductive logic, when fully developed—as it has not been 
so far and will not be in this book—so as to be applicable to the whole 
language of physics, is intended to enable us to determine, for instance, 
which of two hypotheses in physics is more supported by the given set of 
observational results and hence, so to speak, inductively preferable. Those 
who are skeptical with respect to quantitative inductive logic point to 
the fact—and here they are certainly correct—that in the practice of 
science factors of very different kinds influence the choice of a hypothesis. 
Some seem to think that to determine this choice by a simple calculatory 
schema would be just as preposterous as to propose rules of calculation 
which are to determine for every man which of the available women is the 
best for him to marry. 

In judging objections of the kind described, it is important to be clearly 
aware of what is and what is not the nature and task of inductive logic 
and especially of its distinction from the methodology of induction 
(§ 44A). Inductive logic alone does not and cannot determine the best 
hypothesis on a given evidence, if the best hypothesis means that which 
good scientists would prefer. This preference is determined by factors of 
many different kinds, among them logical, methodological, and purely 
subjective factors. In the case of deductive logic it is clear that its task is 
not the representation of the actual procedures of thinking and forming 
beliefs by good scientists, still less of the ways in which they make their 
practical decisions. It is directed only toward one particular logical side 
of these procedures. Take, for example, a physicist who is pondering 
about logical consequences of certain premises, say, a set of well-estab- 
lished physical laws. In which direction his thinking goes, and which par- 
ticular consequence he finds step for step as he goes along, is determined 
by a great number of factors of very different kinds, for instance, certain 
new observations for which he would like to discover whether or not, and 
if so, how, they can be explained with the help of those laws, the satisfac- 
tion he expects to feel if he succeeded in refuting another physicist’s as- 
sumption, the fact that he is more familiar with certain mathematical 
techniques than with other ones, the strength and peculiar character of 
his imagination. All these factors are outside the realm of deductive logic. 
The rules of deductive logic do not in general guide his reflections; they 
can help him only at one point, in proving that a certain sentence con- 
sidered by him is actually a logical consequence of the premises, that is, 
entailed, logically implied, implicitly given with the premises. He will 
reach this result only if deductive logic is sufficiently developed and if he 
is skilful and lucky enough to find a way leading from the premises to the 
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conclusion in accordance with the rules. The situation with inductive 
logic is analogous. If a physicist deliberates whether or not to accept one 
hypothesis rather than another one on the basis of given observational 
results, then inductive logic can be of use to him only in one respect. It 
tells him whether one hypothesis is more supported than the other one; 
and, if the inductive logic applied is not only comparative but quantita- 
tive, it tells him to what degree the hypothesis considered is supported by 
the observations; this is, so to speak, the degree of partial entailment or 
partial logical implication. And he can obtain this help only if inductive 
logic is sufficiently developed and if he is able to find a way of applying it 
to his special case. All the other factors influencing his thinking and his 
decision are outside the scope of inductive logic. Thus the task and func- 
tion of inductive logic is analogous and complementary to that of deduc- 
tive logic. i 

Even if we distinguish clearly the logical factors from the methodologi- 
cal and other nonlogical factors, the question of the possibility of a quan- 
titative inductive logic is still far from being settled. There remain still 
two problems: (1) Can the logical factors be measured, that is, given nu- 
merical values? (2) Is it possible to find a mathematical function of these 
numerical values which would represent the degree of confirmation, that 
is, an adequate quantitative explicatum of probability,? These problems 
are still controversial. We shall discuss the first problem in this section, 
the second in the next section. 

Some students regard as doubtful or impossible the numerical evalua- 
tion even of some of those factors which we characterize as logical. Let us 
examine, as examples, the factors mentioned in this connection by Kries. 
After discussing the inference by analogy (see the quotátion above), he 
speaks about the universal inductive inference which leads from experi- 
ence to laws, that is, sentences of a universal content. “Especially if a 
sentence of this kind”, he says ([Prinzipien], pp. 29 f.), “possesses a great 
variety of consequences and is applicable in many cases and hence can 
be founded on experiential results of many different kinds, then it cannot 
be denied that a numerical measure of this foundation or empirical con- 
firmation does not exist. To look for a numerical value of the certainty, 
for example, of the law of inertia or the principle of the conservation of 
energy would be an entirely illusory attempt; and the same holds for other, 
less well-established theorems of the same or other fields. For any sentence 
of this kind, extension and Precision of its empirical confirmation, rich- 
ness and fertility of its applications, and no less the objections against it 
which have to be eliminated by new assumptions, all these are factors 
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which defy in principle any numerical determination.” By saying “in 
principle”, Kries indicates that he intends to disregard the difficulties 
caused by the fact that the methods of inductive logic may not yet be 
sufficiently developed at the present moment and further by the fact that 
the immense complexity of the situation with respect to his examples may 
practically prevent us from carrying out the numerical evaluation. Of the 
factors he mentions, the following are of a logical nature, and a quantita- 
tive inductive logic is therefore required to take them into account for the 
calculation of the degree of confirmation: (i) the extension of the confirm- 
ing observational material, (ii) the variety of the confirming material, 
(iii) the precision of the confirming material, (iv) the extension (and 
likewise the other factors just mentioned) of the disconfirming material 
(“the objections”, in the original: “die etwa entgegenstehenden Beden- 
ken”). In the passage quoted, Kries makes two different statements con- 
cerning these factors, constituting negative answers to the two questions 
earlier mentioned. He says (1) that “all these are factors which defy in 
principle any numerical determination” and (2) that therefore “a numeri- 
cal measure of this . . . empirical confirmation does not exist”. Now the 
great difficulty involved in (2) must be admitted; it will be discussed in 
greater detail in the next section. The assertion (1), however, seems rather 
surprising, because the contrary appears nearly obvious and fairly general- 
ly assumed by scientists. 

Let us subject this assertion to a closer examination. It says that it is 
impossible in principle to give numerical values to the factors mentioned— 
quite aside from the other question whether we can use these values for 
determining the degree of confirmation. There is first the problem of 
counting the number of confirming and of disconfirming cases for a given 
universal hypothesis / in a given observational report e. It is true, there 
are some serious difficulties involved in this problem, though often over- 
looked. It is usually assumed that, for all practical purposes, it is suffi- 
ciently clear what is meant by a confirming case and by a disconfirming 
case for 4, and hence what is meant by the number of cases of those kinds 
Occurring in e. The difficulties involved in these concepts were first 
pointed out by Carl G. Hempel in his investigations of the concept of con- 
firmation, which we shall later discuss in detail (§§ 87 f.). Let us briefly 
indicate the chief difficulty. Let h be a simple law: ‘(x)(Mx D M'x)’, 
where ‘M’ and ‘M” are molecular predicates; k may say, for instance, 
that all swans are white. Let i be ‘Mb. M’b’ (‘bis a white swan’). Then 
it seems natural to call b a confirming case for the law #. Let j be ‘Mc. 
~M'c’ (‘c is a non-white swan’). Then it seems natural to call c a dis- 


224 IV. THE PROBLEM OF INDUCTIVE LOGIC 


confirming case for k. Now, let i’ be ‘~Md.~Md’ (‘d is a non-white 
non-swan’). At first, we might be tempted to regard d as an irrelevant 
case for h, that is, as neither confirming nor disconfirming. However, let 
h’ be the law ‘(x)(~M'x D ~Mzx)’ (‘all non-white things are non-swans’); 
then?’ has the same relation to h’ asi to h, and hence dis a confirming case 
for h’. Now h and h’ are L-equivalent; they express the same law and 

` differ merely in their formulations. Therefore any observation must either 
confirm both or neither of them. On the other hand, if somebody who in- 
tends to test the law that all swans are white finds a non-swan, say, a 
stone, and observes that it is not white but brown, then he would prob- 
ably hesitate to regard this observation as a confirming case for the law. 
We propose to call this puzzling situation H empel’s paradox because Hem- 
pel first pointed it out and offered a solution for it ; this will be dis- 
cussed later (in Vol. II). Hempel offers a definition for the concept of con- 
firming case which is supposed to overcome this and other difficulties in- 
volved. Even if there are some doubts whether the particular definition 
chosen by Hempel may not be too narrow (see below, § 88), it seems 
plausible to assume that an adequate definition can be found. At any rate, 
nobody has so far given any reasons why it should be impossible in prin- 
ciple to find an adequate definition. On the contrary, scientists speak fre- 
quently about the number of confirming cases. A physicist would say, 
for instance, that he made six experiments in order to test a certain 
law and that he found it confirmed in all six cases. A physician would re- 
port that he tried out a new drug in twenty cases of a certain disease 
and found it successful in twelve cases, not successful in five, while in 
three cases the result was not clear either way; he hereby refers to con- 
firming, disconfirming, and irrelevant cases for the hypothesis that the 
drug has a favorable effect in all cases of the disease in question. In other 
situations, the application of the concept of a confirming case would be 
less clear. This, however, shows merely that the concept is rather vague 
in certain respects; but all explicanda are more or less vague, and this 
fact certainly does not Prove the impossibility of an explicatum. 

Thus let us assume, as most scientists seem to do implicitly, that the 
concept of a confirming case can be defined; the concept of a disconfirming 
case is then easily definable. Then we can determine the number of con- 
firming cases-contained in the observational report e. If these cases are of 
different kinds, we can determine the number of confirming cases of each 
kind. Then it is not difficult to define a measure for the degree of variety 
in the distribution of the cases, on the basis of the number of kinds and 
the numbers of cases of each of the kinds. If the differences between the 
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kinds are not only qualitative (for instance, male and female persons; or 
human beings, dogs, and guinea pigs) but quantitative (for instance, per- 
sons of different age, weight, blood pressure, etc.), then the degree of vari- 
ety will also depend upon the dispersion of the cases with respect to each 
of the relevant magnitudes (measured, for instance, by the standard de- 
viation). In this way we obtain numbers characterizing what Kries calls 
the extension and the variety of empirical confirmation. In the same way, 
the extension and the variety of the disconfirming material can be numeri- 
cally determined. 

That Kries should regard the precision with which the observations 
fulfil the law as a factor inaccessible to numerical evaluation is still more 
surprising. This factor comes into consideration only if the law contains 
quantitative concepts, for instance, physical magnitudes, and the report e 
refers to results of measurement of these magnitudes. Methods for meas- 
uring the precision in the sense here in question were developed a long 
time ago in the branch of mathematical statistics called the theory 
of errors and are constantly applied in many branches of science; for in- 
stance, a value inversely proportional to the standard deviation is often 
taken as a measure of precision. [In our theory of inductive logic the prob- 
lem of the precision with which the observations fulfil a law does not arise 
because quantitative magnitudes do not occur in our object languages £. 
It may be remarked that, once this factor is measured, it is certainly pos- 
sible in a more comprehensive system of inductive logic to take it into 
consideration for the determination of the ¢ of the hypothesis; Jeffreys 
has discussed ways of doing this ([Probab.], chap. iii).] 

It is not quite clear what Kries means when he says that a law is “ap- 
plicable in many cases” and refers to the “richness and fertility of its 
applications”. Perhaps he means by “applications” of the law observable 
consequences; then the phrases just quoted do not refer to a new factor 
but are simply a repetition with other words of what he has said before. 
Or else he means by “applications” of the law its practically useful tech- 
nological applications. In this case the factor referred to is not logical but 
methodological or technological. Hence, for the concept of degree of con- 
firmation, it is neither required nor possible to take account of this factor. 

Our discussion has shown that the first of the two arguments by which 
Kries and other authors try to prove the impossibility of a quantitative 
degree of confirmation is rather weak and can easily be refuted. The asser- 
tion is that certain logical factors, of which it is said correctly that the de- 
gree of confirmation depends upon them, are in principle inaccessible to 
numerical evaluation. We have seen that, on the contrary, it is rather 
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plausible that they can be evaluated numerically. The second argument is 
more serious; it will be discussed in the next section. 


§ 47. Some Difficulties Involved in the Problem of Degree of Confirma- 
tion ; 

There remains the problem whether the degree of confirmation can be ade- 
quately defined if the factors earlier mentioned on which it depends can be 
evaluated numerically. Our discussion tries to show that there is no sufficient 
reason for the assertion that it is impossible. However, there are serious diffi- 
culties involved. We discuss the following points: (A) the degree of confirma- 
tion for a singular prediction on the evidence of an observed frequency; fur- 
ther, the degree of confirmation for a law with respect to different bodies of evi- 
dence; (B) an evidence which contains only confirming cases; (C) a more com- 
plex evidence containing cases which might be regarded as partially confirming; 
(D) an evidence. which contains confirming and disconfirming cases, Æ. The 
degree of confirmation for a law should also depend upon the variety of confirm- 
ing cases. F. In all these situations, the difficulty consists not in the fact that 
there is no adequate function but rather that it is not easy to see how best to 
make a choice among the infinite number of functions—a choice which would not 
seem entirely arbitrary. Whether and how the difficulties can be overcome will 
be seen later. 


After the elimination of the first of the two arguments by which Kries 
and other authors try to prove the impossibility of a quantitative degree 
of confirmation, the second argument may be formulated like this: Even 
if it is true that numerical values can be attributed to each of the factors 
earlier mentioned, on which the degree of confirmation depends, it is still 
impossible to find a definition of a quantitative concept of degree of con- 
firmation which adequately represents this dependence, because the parts 
played by the various factors differ from one another and vary with the 
situations and therefore cannot be summed up in one number. 

Although this argument does not constitute a cogent proof of the im- 
possibility asserted, the circumstances to which it refers deserve careful 

, consideration, because they involve serious difficulties which any attempt 
toward a quantitative inductive logic has to meet. In this section some 
of these difficulties will be explained without making an attempt to 
solve them. The discussion is chiefly intended to make us realize how hard 
the task is that lies before us. 


A. Singular Prediction 


One of the most important and, in a certain sense, also most elementary 
problems of inductive logic concerns the singular predictive inference 
(§ 44B). If we have observed the frequency of a certain property, what is 
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the probability, that a new object of the kind in question has this prop- 
erty? Let the evidence e available to X be a report about an observed 
sample of s (say, a hundred) individuals; e says that among them s; (say, 
eighty) have the property M. 

Let the hypothesis + be the singular prediction that a certain individual 
c not belonging to the sample is M. The question is, what is the prob- 
ability, of # with respect to e; in other words, when we look for a suitable 
definition of degree of confirmation ¢ as an explicatum ‘of probability, 
what value do we want it to attribute to c(,e)? We might perhaps first 
think that this value should be the same as the relative frequency ob- 
served, hence c(h,e) = s,/s. This answer seems quite natural at the first 
glance, and it has indeed sometimes been proposed and even made the 
basis of various inductive methods, which we shall later examine in de- 
tail (in Vol. II). This solution, which we call the Straight Rule, might per- 
haps be accepted as a simple first approximation, but there are reasons 
which make it doubtful whether a definition of ¢ which leads exactly to 
this result is adequate as an explication of probability,. This is shown by 
certain consequences to which such a definition would lead, especially in 
the case s, = s. In this case, in which all observed things are M, c would 
become 1. Is this an adequate value? Perhaps someone might think that 
the value ¢ = r could be accepted if the number s of individuals in the 
sample is large. Suppose we accept it for s = 1,000; should we then also 
accept it for s = 20, or 3, or even 1? And, if not, where should we draw 
the line? However, it is doubtful whether the value c = x is acceptable 
for any s because it would mean that it would be reasonable for X to bet 
with any betting quotient, however large, on the prediction that cis M. 
It seems clear that this would not be reasonable; I should like to find some- 
body who is willing in such a case to bet with me one million dollars 
against one. To put it in another way, ¢ = 1 means that, if X knows e, his 
practically certain for him (and, as we shall see later, # is even logically 
certain, that is, L-implied by e, if the number of individuals involved is 
finite, as in the case under discussion) ; and this does, of course, not hold in 
the case described. 

Considerations of this kind, of which we can give here only brief indica- 
tions, show that, if the observed relative frequency r = 5,/s is 1, we must 
take ¢ < r. This makes it plausible to take ¢ < r also if r is a proper frac- 
tion sufficiently near to 1. Then, however, the difficulty arises how to 
choose the difference between cand r. If r = 1, should we take ¢ = 0.99 or 
©.99999 or what? If r = 0.8, as in the example above, which value smaller 
than 0.8 should be taken for ¢? Every possible choice seems unsatisfactory 
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because completely arbitrary. Laplace, in his so-called rule of succession, 
takes c = (s: + 1)/(s + 2). This value is 1/2 if r = 1/2, and otherwise 
__ always between r and 1/2; hence it fulfils the above requirement that it 

is smaller than r if r is equal to or near r. Furthermore, this function is 
very simple and may appear to some as less arbitrary. Unfortunately, 
however, the general application of Laplace’s rule leads to contradictions, 
as we shall see later. 

Thus the situation is as follows. If e and + are of the kind described, 
c(h,e) should in some way or other depend upon the numbers s and s, (the 
question whether it should, in addition, depend on other numbers may be 
left aside at the moment). Now, it is easy to state functions of s and s, 
which would seem fairly adequate for determining c. Hence the argument 
as originally meant by Kries and others does not hold; it is not at all im- 
possible to find an adequate function for the case in question. There is 
actually a difficulty here; however, it is of a nature opposite to that as- 
serted. There is not a lack of suitable functions but an overabundance, in- 
deed an infinity of them. [For instance, for a given property M, we might 
take c = (s: + m)/(s + 2m) with a positive constant m arbitrarily chos- 
en; m = o is inadequate, as explained above (straight rule); m = 1 gives 
Laplace’s function; contradictions are avoided if for other properties 
other suitable functions are chosen.] The difficulty is that we do not know 
how to make a choice among these functions without an arbitrary and 
hence implausible stipulation. In the other points to be discussed in this 
section we shall find situations involving difficulties of essentially the 
same kind. 


B. Confirming Cases for a Law 


In the preceding section we have seen that the task of defining the con- 
cepts of confirming case and of disconfirming case involves some diffi- 
culties but that, nevertheless, a solution seems possible. Let us now as- 
sume that we have satisfactory definitions for these concepts so that we 
can count the number of confirming and of disconfirming cases contained 
pi observational report e with respect to a hypothesis %, for instance, 
a law. 

Let us first consider situations where e contains no cases disconfirming %4 
but % confirming cases. Here it seems natural to determine c(%,e) as a 
function of n (leaving aside, at present, other determining factors). It 
seems plausible that this function should be never decreasing. But which 
of the infinitely many functions of this kind should we choose? It is not 
difficult to lay down some more requirements for the function which seem 
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plausible, for instance, that its relative increase by one additional confirm- 
ing case should be less for higher . But even after this condition and 
similar ones have been stated, there is still an infinite number of functions 
left from which to choose. Many authors have regarded the task as un- 
solvable; they believe that in the situation described, which is often called 
‘induction by simple enumeration’, it is impossible to express in a quanti- 
tative way the evidential support given by the » confirming cases. I see 
no reason why this should be impossible. It must, however, be admitted 
that this point again shows that it is difficult to define c without making 
quite arbitrary decisions. 


C. More Complex Evidence 


Let us assume again that we have an adequate definition for the con- 
cept of confirming case. Then it may happen that the evidence ¢ available 
to X does not quite suffice to make the individual b a confirming case. 
For example, let 4 be the law ‘(x)(Mx 2 M’x)’ (‘all swans are white’), 
and let e contain ‘Mb. (M’b V Pb)’ (b is a swan and is either white or 
small’) and nothing else about 6. Here, X does not know whether the 
swan b is white or not; but, still, the information that b is either white or 
small is more than nothing. Should it not count for something in weighing 
the evidence for the law 4? But how much? Perhaps as half a confirming 
case? Or should it be left aside as an irrelevant case? Suppose, further- 
more, that another part of the evidence e says that, of roo observed small 
things, 90 were white. Then the assumption that b is white becomes much 
more probable. Therefore it seems no longer justified to disregard b en- 
tirely in determining c(/,¢). Although it cannot be counted as a whole 
confirming case, it must be counted in some way. Perhaps as 9/10 of a 
confirming case? Or perhaps as somewhat less than 9/10, according to the 
reasoning under (A)? At any rate, counting of whole cases does not seem 
sufficient. Thus the problem becomes rather complicated, although the 
evidence considered has still a fairly simple form. The difficulties would 
increase immensely if we were to consider more complex molecular sen- 
tences or even general sentences as the evidence or as conjunctive com- 
ponents of it. How can we hope to find a definition of degree of confirma- 
tion that gives plausible values in all these cases? _ 


D. Disconfirming Cases 


Now let us consider the situation where the evidence e describes a 
sample of s individuals among which s; violate the law (non-white swans), 


g 
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while the remaining s- do not. As hypothesis % we take here not the un- 
restricted law but a corresponding restricted law (D37-2b) which says 
that all swans not mentioned in e are white. Let s, be a fixed number, say, 
100; we consider different values of s,. It seems plausible that c(h,e) is 
considerably less for sı = 1 than for s, = o; but how much less? It seems 
further plausible that, with increasing s,, c decreases monotonically; but 
in which way? Suppose that r, and r, are the values of ¢ for s: = o and 
Sı = 1, respectively, that is, just before and just after finding the first 
disconfirming case; then r; < ro, as above. Suppose that all further ob- 
servations in an increasing number s, contain no disconfirming cases. 
How many additional cases are required for outbalancing the one dis- 
confirming case? That is to say, for which number s, does ¢ come back to 
its original value 7.? Suppose somebody defines cin such a way that 5 addi- 
tional cases balance the one disconfirming case, while somebody else 
offers a definition according to which 5,000 additional cases are required, 
and a third definition is such that no finite number of additional cases 
brings c back to its original value. Which of these definitions—and of an 
infinite number of others—ought to be chosen? Would any decision on 
this point not seem arbitrary? 


E. The Variety of Instances 


One of the principles of the methodology of induction says that in test- 
ing a law we should vary as much as possible those conditions which are 
not specified in the law. This principle is generally recognized, and scien- 
tists followed it long before it was formulated explicitly. The theo- 
retical justification for this methodological principle must lie in a theorem 
of either comparative or quantitative inductive logic to the effect that by 
following the principle, that is, by distributing the test cases among a 
wider variety of different kinds, a higher degree of confirmation is ob- 
tained. Therefore, a definition for ¢ would not be adequate unless it 
yielded a theorem of this kind; hence c should, in certain situations, de- 
pend also upon the extent to which the principle of variety is heeded, that 
is to say, upon the number of different kinds from which the test cases are 
taken and the number of cases for each of these kinds. The problem is 
whether it is possible to find a definition of ¢ such that this requirement is 
fulfilled and, moreover, fulfilled without arbitrary stipulations ad hoc. 
Nagel has clearly shown, by a detailed discussion of numerical examples, 
how great the difficulty is in fulfilling the requirement mentioned ([Prin- 
ciples], pp. 68-71). Although I do not agree with his view that this difi- 
culty makes it impossible to find an adequate definition for c, I admit that 
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the difficulty is a very serious one. In a later chapter (in Vol. II) Nagel’s 
views on this point will be discussed in detail, and then it will be examined 
with reference to his numerical examples how our system of inductive logic 
fulfils the requirement (cf. below, § rrol). 


F. The Task before Us 


The points (A) to (E) just explained are only a few of the difficulties 
which must be overcome if we are to construct an adequate quantitative 
inductive logic. It cannot be denied that these difficulties exist and that 
they are by no means negligible. However, they are not of the nature of a 
barrier thwarting our path; on the contrary, we see many open paths be- 
fore us; the difficulty consists rather in the fact that we do not know at 
present which path will be the best for attaining our aim. The task is to 
find a function which is to fulfil certain requirements; it must depend, 
under certain conditions, on certain factors in certain ways which are 
only vaguely characterized. In order to show that a solution of this task 
is impossible, it would be necessary to prove that the various require- 
ments are logically incompatible. It seems to me that the arguments of 
those who assert the impossibility are very far from proving this point or 
even making it plausible. 

How shall we approach this task? One might perhaps consider the fol- 
lowing procedure: we take up, one after the other, the points mentioned 
in this section and other ones in which there is a choice of several possi- 
bilities; at each point we take a choice which seems suitable for that spe- 
cific problem; maybe we are sometimes compelled to change an earlier de- 
cision in order to make possible a suitable decision in a later point; in this 
way we might hope to work out, so to speak, a compromise solution which 
considers the various requirements. To be more specific, we might perhaps 
think of first deciding on ways for attributing numerical values to the 
factors discussed in the preceding section and to others and then to define 
c as a function of these factors, fulfilling the requirements mentioned in 
this section. 

We shall, however, not try to solve the task in this way. A procedure 
of this kind, consisting of a series of decisions only loosely connected and 
all of them more or less arbitrary, would lead to a patchwork solution that 
in the end would not appear satisfactory to anybody; very likely, a solu- 
tion of this kind would lead to consequences, not immediately recognized, 
which are unplausible or even quite inacceptable. We shall approach the 
task from an entirely different angle. At the end of this chapter we shall 
discuss a way of laying the general foundations of a quantitative inductive 
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logic (§§ 52-54). Then these foundations will actually be constructed in 
the next chapter. However, they will determine only those features of 
inductive logic which are in no way controversial, that is, those in which 
practically all workers in the field agree. Later (in Vol. II) our own system 
of inductive logic will be constructed, based on a definition of degree of 
confirmation. This definition will be reached on the basis of the common 
foundation, not by a large number of single decisions involving choices of 
particular numerical values, but by only two, so to speak, over-all de- 
cisions, decisions of a very general nature, not involving references to any 
of the factors earlier mentioned or to any numerical values (see Appendix, 
§ 110A). Afterward we shall develop the consequences of this definition 
and then examine how the particular problems explained in this section 
and many other problems are solved. Then we shall have to judge whether 
these solutions seem adequate. 

How is the adequacy of a function c, proposed as a quantitative explica- 
tum of probability, to be judged? The simplest approach is the following. 
We imagine a knowledge situation and describe it in a sentence e, and 
further a hypothesis which we formulate by a sentence k. We choose e 
and h such that (1) they are simple enough so that they can be formulated 
in our systems £ and the given definition of ¢ can be applied to them, and 
(2) such that we have an intuitive impression of the value of probability, 
of h on e to which customary ways of inductive thinking would lead. In 
constructing these examples of application, we must be careful to make 
sure that the interpretation chosen fulfils the requirements of independ- 
ence and completeness (§ 18B) and that e fulfils the requirement of total 
evidence (§ 45B). Then we examine whether the value of c(k,e) calculated 
on the basis of the given definition is sufficiently in agreement with the 
intuitive value. Since the intuitive determination of a value is in general 
rather vague, an approximate agreement will be regarded as sufficient. If 
the calculated value differs considerably from the intuitive one, we shall 
regard the definition as inadequate in the case in question. It will seldom 
occur that a proposed definition will generally yield inadequate values. 
More frequently we shall find that it furnishes inadequate values only in 
certain special instances, In this case the definition need not be entirely 
abandoned j it may be that a suitable modification for it can be found. I 
believe that this is the case with several of those inductive methods which 
have been proposed by other authors and which will be examined, to- 
gether with our own definition, with respect to adequacy (in Vol. II). 

: The discussion in this section does not claim to prove that a quantita- 
tive inductive logic is possible; it indicates merely that the arguments of 
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opponents which are meant to prove its impossibility are insufficient. Thus 
the discussion is intended to remove an obstacle which might discourage 
us from even making an attempt toward the construction of a quantitative 
inductive logic. On the other hand, any overoptimism may be dampened 
by the explanation of the great difficulties involved in the task. Whether 
the attempt can and will be successful remains to be seen. 


§ 48. Is Probability, Used as a Quantitative Concept? 


A. Here the question is considered, not what the nature of probability, 
actually is or what an explicatum for it would be, but rather how people use 
the concept of probability. It appears plausible that many and perhaps most 
people in practical life and in science, who do not know any theory of proba- 
bility, use the concept of probability, in the following way. They use proba- 
bility, not only as a classificatory concept but also as a comparative concept 
(‘more probable’). Furthermore, they use probability; in the following cases 
even as a quantitative concept: (B) with respect to predictions of results of games 
of chance, (C) with respect to a hypothesis concerning an individual in a field 
where a relevant relative frequency is known. D. In other cases we can deter- 
mine which numerical value they implicitly attribute to a probability,, even 
if they do not state it explicitly, by observing their reactions to betting pro- 
posals. Æ. If two people attribute considerably different probability, values to 
the same hypothesis on the same evidence, then they are inclined to offer 
theoretical arguments; this shows that they regard probability, as an objective 
concept. All these are merely psychological facts; they do not prove that there 
is an objective quantitative explicatum for probability;. However, they may 
encourage us to make an attempt to find such an explicatum. 


A. Probability, Is Used as a Comparative Concept 


Our problem is to define an adequate quantitative concept of confirma- 
tion or, failing this, at least an adequate comparative concept. At the pres- 
ent stage of our discussions in this chapter, we do not yet know whether 
this is possible. The consideration of the difficulties explained in the pre- 
ceding section may make us rather skeptical. Now we shall look at certain 
facts which may revive some hope. These facts do not concern the prob- 
lem of probability, itself but only what people seem to believe about this 
concept or rather how they are inclined to use this concept.This, of course, 
provides no logical argument; but it may nevertheless be a practical factor 
influencing our expectations with respect to the solubility of the logical 
problem of an explication of probability:. 

Before we raise the more important question concerning the quantita- 
tive concept, let us examine whether in everyday life and in science, be- 
fore an explication for probability, and a systematic theory is constructed, 
the concept of probability, is used as a comparative concept (for instance, 


234 IV. THE PROBLEM OF INDUCTIVE LOGIC 


in the form ‘4 is confirmed by e more than h’ by e”) or only as a classifica- 
tory concept (ʻe gives confirming evidence for h’). (These concepts have 
been explained in § 8.) It seems that most authors on probability, agree 
that the comparative concept is frequently used in cases of the following 
kind. These cases are characterized by having only one body of evidence, 
that is, e is the same as e’, while k and h' may be different from each 
other. For example, statements similar to the following ones are often 
made, where it is understood that the common evidence e for both hypoth- 
eses is the total knowledge of the speaker at the time of speaking: (1) ‘It 
is more likely to rain tomorrow than not’; (2) ‘Peter will probably come by 
train rather than by bus’. Comparisons of this kind are obviously neces- 
sary for all our practical decisions. Sometimes, a comparison is made for 
the same hypothesis with respect to two pieces of evidence, for example, 
(3) ‘Now, in view of today’s weather, the chances for good weather next 
` Sunday are better than before’; (4) ‘By the results of Koch’s experiments 
the assumption that tuberculosis is caused by bacilli gained much in 
weight’. I believe that the customary use of probability, as a comparative 
concept covers a much wider field than the two special kinds of cases de- 
scribed, including many cases where eand ¢' are different and even inde- 
pendent of one another (while in the examples mentioned one body of 
evidence L-implies the other) and simultaneously / and h’ are different 
and even independent. However, this point is controversial. My view on 
it will become clear from what I shall say about the quantitative concept. 


B. Probability, and Games of Chance 


Is the concept of probability, customarily used also in a quantitative 
way and, if so, under which conditions? A few authors seem to believe that 
the concept is never used quantitatively by reasonable and careful people. 
However, I think that the majority of authors believe that at least in a 
special kind of case, namely, for predictions of results of games of chance, 
numerical values are attributed customarily—and, they would add, 
rightly so—to probability,. [Kries ascribes numerical values to probability 
only in cases analogous to those of games of chance; Keynes does it only 
in a very restricted kind of cases to which presumably those of games of 
chance belong; Nagel, as mentioned above (§§ 46 and 47E), doubts in 
general the possibility of assigning numerical values to the degree of con- 
firmation, in contradistinction to probability.; however, he discusses 
chiefly universal theories and does not mention in this connection games 
of chance.] For example, let us consider a case of the singular predictive 
inference. Let e contain the information that a certain die is symmetrically 
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built, that 6,000 throws haye been made with it under the ordinary con- 
ditions, and that 1,000 of them have yielded an ace; let 4 be the predic- 
tion that the next throw with this die will result in an ace; then there will 
be almost general agreement that the probability, of % on e is (exactly or 
approximately) 1/6. It is true, there are a few theoreticians who would 
refuse to make any statement in terms of ‘probability’ with respect to h 
because, according to their conception, a probability statement with re- 
spect to a single event is meaningless; in our terminology, they recognize 
only the concept of probability, and believe there is no such concept as 
probability, or at least no quantitative concept. However, the man in the 
street and the practical scientist in the laboratory have no such scruples. 
If we give them the information e and ask them what is the probability or 
chance of h, the overwhelming majority will not hesitate to give an answer, 
and the overwhelming majority of the answers will show good agreement 
with one another. And even among those who hesitate to use here a term 
like ‘probability’ or ‘chance’, many will answer affirmatively if we ask 
them whether between two people who have the information e a bet on %4 of 
one against five is to be regarded as equitable. This answer shows that 
they attribute the same probability, value as we do and that they merely 
reject our terminology. 


C. Probability, and Direct Inference 

The situation is not very different in cases of the direct inference, even 
if they do not concern a game of chance. Here the evidence e contains 
suitable statistical information concerning a class to which the individual 
referred to in % belongs. Suppose that / is ‘Wc’, that the only information 
which e gives concerning ¢ is ‘Wc’, and that e, furthermore, says that 
the total number of individuals with the property W, is 1,000 and that 
800 of them have the property W. For instance, e says that the person c 
is one of the 1,000 inhabitants of the village Norville among whom there 
are 800 of Norwegian ancestry; k says that c is one of the latter. Certainly 
many people will be prepared to assign here a numerical value to the 
probability, of # on e, and practically all of them will take the value 0.8 
and hence will regard as equitable a bet of four against one on h between 
two bettors who have no other information than e. This is justified by 


our previous discussion in § 41C. 


D. Probability, and Betting Behavior 


Now let us consider predictions which do not concern games of chance 
and where no clearly relevant statistical information is available. Suppose 
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X makes the following statement: ‘My friend Peter will probably come 
by train, not by bus’, which we interpret as an elliptical statement on 
probability,. Perhaps we think first that his expectation is chiefly in- 
fluenced by some knowledge about relative frequencies; so we ask him: 
‘Do you think so because you know that usually many more passengers 
on this trip take train than bus?’ ‘Oh, no, I am not thinking of the other 
people; it’s just that I know my friend.’ ‘Is it then that you have ob- 
served that he takes a train much more frequently than a bus?’ ‘No, he is 
a thrifty man and often prefers a bus. However, he will probably decide 
that for this long distance a bus trip would be too tiring.’ Our question is 
whether people like X, whom we suppose to know neither the mathemati- 
cal calculus of probability nor any philosophical theories on probability, 
are usually willing to assign a numerical value to the probability, in the 
sense of probability,, in cases like the example just given. Another example 
of this kind: ‘Thirty years from now most international conferences will 
probably use an international auxiliary language’. The chief difficulty with 
these examples is not the fact that they refer to a single event; that is like- 
wise the case with the examples under (B) and (C). The important differ- 
ence is the fact that here we cannot simply take a known relative frequency 
as the value of the probability, in question. Also in these cases, to be sure, 
there are relative frequencies—either exactly known or vaguely estimated 
—which belong to the relevant facts known to X and influence his prob- 
ability judgment; but he will presumably not take simply one of these 
frequencies as the probability value. 

Some authors seem to think that ordinary people like X do not attribute 
any numerical value at all to the probability in cases of this kind. It may 
indeed very well be that, if we asked people who make probability state- 
ments of this kind whether the probability asserted has a numerical value 
and whether they could express it in terms of a percentage, many and per- 
haps even most of them would answer in the negative; perhaps they would 
even be rather surprised that we expected of them such an “obviously im- 
possible” thing as measuring the probabilities in these cases. However, 
this does not prove the point. From the fact that X tells us that his prob- 
ability concept, as used in cases of the kind here discussed, is a nonquanti- 
tative concept, we cannot infer that his concept is actually nonquantita- 
tive. We have earlier (at the beginning of § 11) called attention to the 
discrepancy frequently found between what people, even scientists, say 
about the meaning and nature of their statements and terms and the way 
in which they actually use these statements and terms. Although our di- 
rect question to X does not succeed in eliciting a numerical evaluation of 
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the probability, maybe his response to a certain situation which induces 
him to take a practical decision might show us nevertheless that the 
probability which he ascribes to a certain prediction has a numerical 
value. Since probability, means a fair betting quotient (§ 41B), we might 
offer X bets on the prediction in question with various betting quotients 
and see which of them he accepts. If X refuses to bet because he does not 
like to risk a loss, we may change the situation slightly by offering a re- 
ward instead of a bet. We can use here a device proposed by Emile Borel 
([Valeur], p. 85) and other authors; it consists in inducing X to reveal to 
us by an act of choice how he compares the probability in question to the 
probability in a simple situation of a game of chance; we assume that he 
evaluates this latter probability in accordance with the general consent. 
This may, for instance, be done in the following way with respect to the 
first example above. We promise to pay X $10 under certain conditions; 
we allow him to choose one of the following conditions (a) or (b): (a) we 
shall wait until Peter comes; if he comes by train, we shall give X $10, 
otherwise not; (b) X shall cast his die, which he and we know to be normal; 
if the result is not an ace, we shall give him $ro. If he chooses (a), he there- 
by implicitly reveals that he regards the probability of Peter’s coming by 
train as not less than 5/6; if he chooses (6), he regards the probability 
as not more than 5/6. By a series of experiments of this kind, either in the 
form of bets or of rewards, with different values as standards of com- 
parison, we find narrower and narrower intervals which include the value, 
unknown to us, of the probability, which X attributes to the prediction; 
in other words, we measure this unknown value with greater and greater 
precision. This procedure is not only analogous to the ordinary procedure 
of measuring the value of an empirical magnitude, say, the length or 
weight of a body, but is itself an instance of such a measurement. The 
magnitude measured here is not the semantical, logical concept of prob- 
ability, or its explicatum, the degree of confirmation (for which we cannot 
speak of ‘unknown values’, see § 41D), but the corresponding pragmati- 
cal, psychological concept ‘the probability or degree of belief of the pre- 
diction % at the time ¢ for X’. 3 i 
It seems plausible to assume that many people would react to experi- — 
ments of this kind in a consistent way, as long as we do not take too nar- 
row intervals, If this assumption is right, it means that many people do 
attribute numerical values to the probability: of their predictions, no 
matter whether or not they are able to state these values directly and 
explicitly on a direct question. In this point I am in agreement with 
Reichenbach. whose concept of weight corresponds to our concept of 


238 IV. THE PROBLEM OF INDUCTIVE LOGIC 


probability, (see above, § 41E); he says: “There are a great many germs 
of a metrical [= quantitative] determination of weights contained in the 
habits of business and daily life. The habit of betting on almost every 
thing unknown but interesting to us shows that the man of practical life 
knows more about weights than many philosophers will admit” ({Experi- 
ence], pp. 318 f.). 


E. Probability, Is Used as an Objective Concept 


Assuming that our preceding considerations and expectations as to 
many people’s reactions are correct, they merely show certain subjective 
habits. The problem whether there is an objective concept of quantitative 
probability, or degree of confirmation is thereby not answered. This prob- 
lem will not be discussed in this section; but we shall now briefly consider 
the question, again of a pragmatical, psychological nature, as to what is 
the ordinary people’s attitude to this problem. Here again we shall not 
ask them directly: ‘What is your answer to this problem?’ We shall rather 
try to observe whether the people’s habits of behavior reveal an implicit 
belief in an underlying objective concept of probability,. 

First let us look at the other side. How do people behave when they 
regard a certain concept as purely or chiefly subjective? Suppose X ap- 
preciates Grieg’s music much more than that of Chopin, while his friend Y 
shows the opposite preference. We see X playing his Grieg records to Y, 
trying to call his attention to certain features of them, praising them in 
words of emotional appeal, etc. We observe Y doing something similar 
with Chopin records to X. Suppose that afterward we find each of them 
expressing his preference unchanged. Still neither of them tries to prove 
by theoretical arguments that the other is wrong; rather they agree: ‘We 
seem to have different tastes’, and there the matter remains. In other 
cases of fundamentally the same nature, more objective, factual factors 
are involved. Suppose X says: ‘I should buy this house if I could get it for 
$6,000’, while F replies: ‘I should not take it for $2,000’. There may first 
be certain facts concerning the house, advantageous or disadvantageous, 
which one of the two friends has discovered and communicates to the 
other. If, however, after they have shared all relevant factual information, 
they still find that their appreciations, expressed by the price either of 
them would be willing to pay, differ considerably, then they agree again 
on their disagreement, and that settles the matter. Either of them may 
be surprised by the other’s different appreciation, but neither of them says 
that the other is wrong. We infer from the behavior of X and Y in both 
these cases that they regard a judgment of appreciation or preference in 
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music or in things of everyday life (except for factual components like 
usefulness for a’ certain purpose, etc.) as subjective. 

Now take a case of the other kind. Suppose X and Y look at the moon 
and discuss its distance. They do not know astronomy, and they do not 
think of any other method of measuring distances than by rods or chains; 
and they agree that here this method is impracticable because of technical 
difficulties. Thus they content themselves with making the best estimate 
they can of the distance, just from the impression they have by looking 
at the moon. X estimates the distance as one hundred miles, Y as one 
million miles. It looks at first as if the situation were similar to that in 
the former cases of appreciation of music or of a house. After each has tried 
to influence the other's opinion by calling his attention to certain features 
which he may have overlooked, they find their estimates are still un- 
changed. They agree: ‘We just have quite different estimates’, and they 
give up the hope of coming to an agreement. However, in spite of the 
similarity with the earlier cases up to this point, there is a fundamental 
difference between their attitudes here and there. Both admit that they 
have no very good reasons for their estimates; but each of them says: ‘If I 
am right, then you are wrong’. And although they have no hope of actually 
deciding the question, each of them still says: ‘Too bad that we cannot 
measure the distance by rods and do not know another method. If only we 
found a way of measuring, then the question would be decided. If the re- 
sult turned out different from my estimate, I should, of course, give up 
the estimate. Thus we should come to an agreement.’ In this way we see 
that X and F regard distance as an objective concept. This is not altered 
by the fact that their estimates differ very much, that they feel rather un- 
certain as to the accuracy of their estimates, and that they do not know 
any feasible procedure for deciding the question. 

Now we come back to the quantitative statements of probability, made 
by X, who is an ordinary man not prejudiced by any knowlédge of theo- 
ries on probability. Does X regard these statements as subjective or as 
objective, as merely an expression of his personal attitude like an appre- 
ciation of music, or rather as an assertion about something that is inde- 
pendent of personal taste, such that if X and F attribute, on the basis of 
their common knowledge e, different values to the probability; of h (con- 
siderably different values, since they are in most cases merely meant as 
rough estimates) then at least one of them must be wrong? I think that 
many and perhaps most people have the latter attitude. We must, of 
course, be careful not to confuse the relativity of probability: with respect 
to a body of evidence with subjectivity. As to the relativity, there is now 


240 IV. THE PROBLEM OF INDUCTIVE LOGIC 


general agreement among authors on probability, (see above, § roA); 
hence we may suppose that X will agree with us when wé explain to him 
that the ordinary probability, statements are often elliptical and that a 
complete statement has the form: ‘The probability, of the hypothesis 4 
with respect to the evidence e is r’. Thus, if X says: ‘The probability that 
it will rain tomorrow is 1/2’, and Y says: ‘The probability that it will 
rain tomorrow is 3/4’, then it is possible that both are right. This ob- 
vious fact should not be referred to as subjectivity of probability, as 
earlier authors have often done, but rather as relativity, here concealed 
by the omission of references to the evidence. If the two elliptical state- 
ments are made complete by the insertion of references to the evidence 
meant, viz., the knowledge of X and the knowledge of Y, respectively, 
then the appearance of a contradiction disappears. The question of sub- 
jectivity or objectivity must be raised with respect to these complete 
statements. Suppose X and Y make two probability, statements not only 
for the same hypothesis % but also with respect to the same evidence e. 
This may happen either if both have the same relevant knowledge, for 
instance, by pooling their information, or if they do not take as evidence 
their own knowledge but something else, say, a fictitious state of knowl- 
edge. Thus, for instance, X may say to F: ‘If we did not know of the per- 
son c, as we actually do know, that he is not of Norwegian ancestry, but 
knew only that he is an inhabitant of Norville, and if, in addition, we had 
our present knowledge that among the 1,000 inhabitants of Norville there 
are 800 of Norwegian descent, which value r should we attribute to the 
probability of the assumption that c is of Norwegian descent? In other 
words, which betting quotient r would be equitable between us?’ Suppose 
that X himself answers this question with ‘r = 4/5’ and Y with ‘r = 1/2’. 
The decisive point now is the reaction of X to this divergence. If he says: 
‘Well, we seem to differ here, just as we do in our tastes in music; and 
that’s all there is to it’, then this shows that he regards probability, as a 
subjective concept. If, on the other hand, he offers theoretical arguments 
with the professed intention of refuting the value stated by Y, then this 
shows that he regards probability, as objective. And the same holds even 
if X reacts only in the following much weaker way: ‘Well, I feel I am right 
but I am not quite sure. And I am not clever enough to find arguments 
which might convince you. Hence our disagreement remains unsolved, 
just as our disagreement concerning the distance of the moon, One thing 
is sure though, here as there; if I am right, then you are wrong.’ I think 
that most people, including practically working scientists, would react in 
this latter way, and hence regard probability, as objective. 
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Even if this assumption is right, it is no more than a certain historical, 
psychological fact: many people show certain behavior habits which re- 
veal an implicit, underlying belief in the objectivity of probability,. From 
this fact it does, of course, by no means follow that probability, is actually 
an objective concept or that it is possible to find an objective concept 
which is an adequate explicatum for probability,. On the other hand, the 
fact that many reasonable persons think and act successfully on the basis 
of an implicit belief in an objective concept of probability., although they 
are not able to give a definition of it, may give us some hope of finding an 
objective quantitative explicatum in spite of the difficulties explained in 
the preceding section. 


§ 49. The Question of the Usefulness of Inductive Logic 


A. Theoretical usefulness. If a quantitative inductive logic can be constructed 
either for simple language systems, as will be done in this book, or for the whole 
language of science, what help would it give to work in empirical science? The 
use of inductive logic in science is similar to that of deductive logic. In many 
cases, the situation is too complicated for an application of inductive logic. In 
other cases, however, application is practically possible. This holds especially 
for the cases of inductive inference in which the evidence or the hypothesis or 
both are of a statistical nature, Inductive logic, if sufficiently developed, will 
serve as a logical foundation for the methods of mathematical statistics. We 
see today the first steps in this direction, which, if continued, will lead to greater 
clarity and exactness of the basic concepts of statistics. The development of 
inductive logic will furthermore help in clarifying the problems of the nature 
and validity of inductive reasoning. 

B. Practical usefulness. The value of an empirical magnitude, for example, 
the length of a rod, will often be an important factor in determining the deci- 
sions of a person X, provided X knows this value. If he does not know it, he has 
instead to take an estimate as the basis of his decision. It is often said that prob- 
ability is a guide of life. For which of the two probability concepts does this 
hold? The statements concerning probability, the relative frequency in the 
long run, are empirical like those about length. Such a statement can serve as a 
basis for a practical decision only if it is known. However, it can never be 
known directly if probability., according to the customary conceptions, refers 
to an infinite population and is explicated as a limit. Therefore X must base 
his decision on an estimate of probability, hence a value of probability;. It 
becomes clear that neither empirical science alone nor inductive logic alone 
can serve as a guide of life, but only both in co-operation, 


Although we do not yet know whether our aim, a system of inductive 
logic, can be reached, it is worth while to consider the question whether 
and how such a system would be useful ¿f it could be constructed, Some 
philosophers and scientists are skeptical in this respect. If their doubts 
were right, it would be a waste of time to try to construct a system of in- 
ductive logic. But there are good reasons against their doubts. These will 
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now be discussed. Let us assume hypothetically, for the sake of this dis- 
cussion, that it is possible to construct a system of quantitative inductive 
logic, based on a concept of degree of confirmation as a quantitative ex- 
plicatum for probability,, first for simple languages like our systems &, 
and then extended to languages containing quantitative concepts, for 
example, a systematized language of physics with real numbers as space- 
time coordinates and with signs for mathematical and physical functions. 
We shall now discuss the question of the usefulness of this system in two 
respects: (A) What assistance will this system give in the field of theoreli- 
cal work, especially in empirical science? (B) How could the system be 
used in making practical decisions? 


A. Theoretical Usefulness of Inductive Logic in Science 


The possibility of applying inductive logic in science and also the limita- 
tions to this application, some of them essential but others merely techni- 
cal, can best be clarified by the analogy with deductive logic. Scientists 
carry out their deductive inferences in most cases, especially where mathe- 
matical transformations are not yet involved, in an intuitive, instinctive 
way, that is, without the use of explicitly formulated rules of logic; and 
they are in general quite successful in doing so. Therefore we cannot ex- 
pect that the development and systematization of deductive logic should 
have the effect of immediately increasing the correctness or efficiency of 
the inferential procedures of the scientist. Many cases with which he has 
to deal in his work are so simple that the use of explicit logical rules is un- 
necessary. In other cases, the premises with which he works are so com- 
plex that he is either not able or not willing to take the trouble of formu- 
lating them explicitly and exhaustively; this may sometimes not prevent 
him from recognizing—with more or less clarity and more or less certainty 
—that a given conclusion follows from the premises; but it prevents the 
application of explicit rules. On the other hand, there are certain cases 
where deductive logic has proved to be very useful for the scientist, es- 

- pecially since its very extensive development in these last hundred years; 
and we may expect the number of these cases to increase with the further 
development. For instance, the axiomatic method in its more exact mod- 
ern form has been possible only on the basis of modern logic; and this 
method becomes more and more important in mathematics and its ap- 
plications, and also in physics and other fields of science. F: urthermore, I 
think we may assume that certain errors in deductive procedure which 
have earlier been made in science would have been avoided if the methods 
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of modern logic had been available at that time. Prominent examples are 
the alleged deductions of Euclid’s parallel axiom from the other axioms; 
if one of the most fertile fields of modern logic, the logic of relations, had 
been known at that time, it would have prevented those errors because it 
makes it possible to represent the derivation of a conclusion from the 
axioms in an exact, formal way, avoiding the earlier pitfalls of a nonformal 
method, especially the inadvertent use of an additional, nonformulated 
premise on the basis of intuition. 

The situation with inductive logic is similar. There is first its essential 
limitation to logical factors, to the exclusion of methodological factors 
(§ 44A). This limitation does by no means make inductive logic useless, 
for, if it gives to a scientist a numerical value of the degree of confirmation 
which embodies all logical factors, it thereby does not prevent him from 
taking into consideration for his decision also as many nonlogical factors 
as he wants to; on the contrary, it facilitates this task. However, there are 
many situations in science which by their complexity make the applica- 
tion of inductive logic practically impossible. For instance, we cannot ex- 
pect to apply inductive logic to Einstein’s general theory of relativity, to 
find a numerical value for the degree of confirmation of this theory (or, 
rather, of an instance of it, § r10G) on the basis of the whole observa- 
tional material known to physicists at the time when the theory was first 
stated, or for the increase in the degree in consequence of the observations 
of the solar eclipse of 1919. The same holds for the other steps in the revo- 
lutionary transformation of modern physics, especially those in connec- 
tion with quantum theory. In all these cases the relevant observational 
material is immensely extensive; it is not at all restricted to those crucial 
experiments which we usually associate with the origin of the new theories. 
Furthermore, the structure of the new physical theory in each of these 
cases is so comprehensive and complicated that no physicist at any stage 
in the development has given a complete and exact formulation of it (ac- 
cording to the rigorous standards of modern logic), let alone a complete 
and exact formulation of the observational evidence. Therefore an appli- 
cation of inductive logic in these cases is out of the question. 

On the other hand, there are also cases in which there are good reasons 
for the expectation that the application of inductive logic will become use- 
ful for the scientist, or in which the useful application is already possible 
today. This holds especially for those fields of science where statistical 
methods are used for the description of distributions of certain properties. 
As we shall see later, the inductive inferences (§ 44B) are of special im- 
portance in the form of statistical inferences, that is, in cases where the 
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hypothesis or the evidence or both give statistical information, for in- 
stance, by stating relative frequencies. Suppose a scientist knows the sta- 
tistical distribution of certain properties within a given population (of 
persons or bacteria or atoms or whatever else) and, on this basis, wants to 
find out the probability, of a certain assumption as to their distribution 
in an as yet unobserved sample (direct inference); or, conversely, the dis- 
tribution in a sample is known and a hypothesis is made concerning the 
distribution either in the whole population (inverse inference) or in another 
sample (predictive inference) ; for these and similar cases of statistical in- 
ferences, inductive logic can be of immediate help. 

Many of the methods of mathematical statistics are essentially induc- 
tive methods, especially those which have been developed during the last 
decades and have found very fruitful application in agriculture, medicine, 
industrial production, insurance, and many other fields, among them 
methods of estimation, curve-fitting, significance tests, etc. These meth- 
ods, as applied today by most statisticians, are usually not based on a 
system of inductive logic, but developed independently. Similarly, de- 
ductive mathematics (arithmetic, analysis, theory of functions, infinitesi- 
mal calculus, etc.) was first developed independently of logic for more than 
two thousand years. Finally, Frege, Russell, and Whitehead succeeded 
in basing the concepts and principles of mathematics on those of deductive 
logic and thereby making mathematics a part of logic itself. Although this 
achievement changed hardly anything in the content of mathematics, it 
was very important because it established mathematics for the first time 
on a solid foundation and contributed greatly to the clarity and exactness 
of the basic concepts of mathematics. It is obvious that this achievement 
was possible only through the utilization of symbolic logic. In my view, 
the situation with inductive statistics is quite analogous. If it is possible 
to construct quantitative inductive logic to the extent indicated at the 
beginning of this section, again, of course, with the help of symbolic logic, 
‘then it will be possible to base statistics upon it and thereby make it a part 
of inductive logic. (Obviously, this holds only for the inductive part of 
statistics, the theory of statistical inference, as distinguished from the 
deductive part, usually called descriptive statistics, which belongs to (de- 
ductive) mathematics and hence is part of deductive logic.) It may be 
expected that mathematical statistics will thereby gain for the first time _ 
a solid foundation, a systematic unity of its various methods, and a clarity 
and exactness of its basic concepts. In spite of the great wealth in methods 
and results achieved in modern mathematical statistics, and especially its 
great fruitfulness in practical application, it is clearly in need of the theo- 
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retical virtues just mentioned, more urgently than deductive mathematics 
was before Frege. Bebe 
The system of inductive logic which will be developed in this book has 
by far not yet the extension indicated above. But even in this limited do- 
main it will be possible to introduce a general concept of estimation and 
to find with its help some simple but new and important results concern- 
ing the predictive and inverse estimates of relative frequency (chap. ix). 
And in the same limited domain the founding of statistical methods on the 
basis of inductive logic will in certain cases even lead to corrections in 
some general theorems and, consequently, in numerical results. It will be 
shown (in Vol. II) that certain numerical values obtained by some meth- 
ods widely used today in mathematical statistics are not quite adequate 
and that the values supplied by the methods of our inductive logic are 
more adequate. This holds, for example, for predictive and inverse esti- 
mates of relative frequency based on small samples. From a practical 
point of view, these corrections are of minor importance because the 
numerical difference is small for samples of those sizes with which statis- 
ticians usually work. But from a theoretical and fundamental point of 
view, the fact of this correction is interesting because it means a change, 
though only a slight one, in content. [One would have an analogue in the 
reduction of deductive mathematics to deductive logic if, for example, 
Frege in the course of his logical work had found that certain results ob- 
tained by earlier uncritical uses of divergent series had to be corrected, 
a discovery which actually was made already by A. L. Cauchy (1823).] 
Jeffreys was the first, and is so far the only one, to attempt a solution 
of the difficult problem of founding mathematical statistics on a system 
of inductive logic comprehensive enough to be applied to the quantitative 
language of physics. He came to this problem not from logic but from 
empirical science. His work in fields of science where statistical methods 
are frequently applied, above all in his special field of geophysics, showed 
him the necessity of a theory of probability; sufficiently developed to serve 
as a logical foundation for the use of statistical methods (see his [Probab.], 
Preface). Throughout his work he emphasizes the requirement that a sys- 
tem of inductive logic must be applicable in the actual work of scientists, 
and he himself gives numerous examples for the application of his methods 
to special problems in geophysics and other branches of physics. It seems 
to me that Jeffreys’ examples provide ample illustration for the usefulness 
and even indispensability of inductive logic for the practical work in em- 
` pirical science. Irrespective of whether or not we agree with all details of 
his method, there can be no doubt that he has done valuable pioneer work 
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in bridging the gap between inductive logic and the domain of statistical 

` methods dealing with quantitative physical magnitudes. [We may leave 
aside here the objections which we shall raise in a later chapter against cer- 
tain features which Jeffreys’ theory has in common with the classical the- 
ory of probability; we shall show in the construction of our theory how 
the difficulties here involved can be overcome; the present discussion con- 
cerns not the correctness of a particular theory but the usefulness of in- 
ductive logic in general, provided a good theory can be found.] 

It seems to me that there is still another direction in which the develop- 
ment both of deductive and of inductive logic becomes important for sci- 
entific thinking in general. The development of deductive logic not only 
has made possible the application in numerous concrete cases but has, in 
addition, thrown light on certain fundamental problems of a more general 
nature. Seen from a historical and psychological angle, it has been a side 
effect of the development of modern deductive logic—though, from a phil- 
osophical point of view, it may be regarded as an achievement of out- 
standing importance—that today we have a better understanding of the 
foundations of deductive inference, of the reasons for its validity, and of 
the nature of the sentences which state purely logical connections. There- 
by also remarkable progress has been made in the clarification of the na- 
ture of mathematics and especially of the relation between mathematics 
and empirical science. I believe that, in a similar way, the development of 
inductive logic will, over and above the applications in concrete cases, 
yield results of a more general, we might say, a philosophical character: a 
clarification of the foundations of induction (in the wide sense in which 
we use this term), of the presuppositions of induction, which are hardly 
ever made explicit, and the meaning ahd conditions of its validity. This 
includes the old; much-debated but still controversial question concerning 
the justification of induction or of special kinds of inductive inference, for 
example, those mentioned earlier. It belongs to the aims of this book not 
only to construct a system of inductive logic but also to contribute to the 
clarification of these more general problems. In both respects, this book 
cannot do more than take a few steps. I am convinced that the future de- 
velopment will soon not only improve the technical methods of inductive 
logic and widely extend their scope, but simultaneously also increase our 
insight, today still clouded in many points, into the nature and validity 
of inductive reasoning. 


B. Practical Usefulness of Inductive Logic: Probability as a Guide of Life 


Since the earliest beginnings of the development of the calculus of prob- 
ability the mathematicians and philosophers who worked on it empha- 
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sized its applicability to practical problems. At first the field of applica- 
tion was chiefly that of games of chance; the calculus claimed to provide 
methods by which a gambler could calculate the chances in a game and 
thereby determine under what conditions it would be advisable to accept 
an offered game or a bet, and to judge whether the rules of the game were 
fair, that is, not favoring any of the players. Soon it was recognized that 
the decisions made in more serious affairs, individual decisions in private 
life or political decisions in the life of a community, are not different in 
principle from those made in a game; the situations here are more com- 
plicated and cannot be analyzed as easily into their determining factors, 
and the number of relevant factors is often much greater. But this differ- 
ence in complexity seems to be merely a difference in degree. Therefore, 
it was hoped that, as soon as science would furnish a more thoroughgoing 
analysis of the laws of nature and society, the calculus of probability 
would become one of the most efficient instruments of the human mind, 
helping one to find in any given situation the most reasonable decision, 
that is, the decision giving the best hope of success. The authors during 
the period of the Enlightenment were most optimistic in this respect. 
Contemporary authors agree in principle but are usually more moderate 
in their expectations concerning the benefits to be obtained by the appli- 
cation of probability. On the other hand, they are able, within certain 
limited fields, to speak not only of hopes but of accomplished results, 
They can point out the many fruitful applications of probability consid- 
erations and statistical methods based upon probability in such various 
fields as insurance, public health, genetics, theoretical physics, astronomy, 
the design of agricultural experiments, quality control in industrial mass 
production, the analysis of economic trends and of personality factors, 
and many more. These applications lead not only to theoretical results 
but also to practical decisions concerning insurance rates, public health 
measures, the choice of special breeds of wheat, changes in methods of 
mass production and inspection, etc. 

The basic fact that makes inductive logic useful and even necessary for 
obtaining rational decisions is the impossibility of knowing the future 
with certainty. Any man X has to base his decisions on expectations con- 
cerning events which are independent of his actions and also concerning 
events which might happen in consequence of certain acts which he might 
decide to carry out. For expectations of both kinds, X has no certainties 
but only probabilities. And if his decision is to be rational, it must be 
determined by these probabilities. “To us probability is the very guide 
of life”, as Bishop Joseph Butler said (in the Preface of The analogy of 
religion [1736], quoted from Keynes [Probab.], p. 309). 
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Since we have found two concepts of probability fundamentally differ- 
ent in nature, the question arises as to what part each of them plays in 
determining practical decisions. Those who want to restrict the theory of 
probability to probability., the frequency concept, believe that only this 
concept can be of help in practical life. Their principal argument for this 
belief is the fact that only a statement on probability, says something 
about the facts of nature, while a statement on probability,, being purely 
logical, has no factual content. This characterization of the two concepts 
is certainly correct, but it remains to examine the question whether the 
conclusion follows that the logical concept of probability, is not appli- 
cable for practical purposes. 

According to our previous discussion (§ 41D), the distinction between 
a probability, statement for a property M and a probability, statement 
for a singular hypothesis concerning M may be regarded as a special case 
of the general distinction between the following two kinds of statements: 
(1) a statement about the actual value of a physical magnitude in a given 
case, a value which is either unknown to the observer or at least not 
known exactly, and (2) a statement about the estimate of this value with 
respect to given evidence. Let us consider an example of a familiar kind 
for this distinction; this may help us in clarifying the situation with re- 
spect to the two probability concepts. Let us suppose that the evidence e 
available to the observer X contains the information that the length of a 
given rod has been measured three times, with the results, say, 80.0, 80.1, 
80.5. ‘Let us assume that the measurements were made under the same 
conditions. Then there is no reason for regarding any one of the three 
results as more reliable than any other. Therefore X will take as the 
estimate of the length of the rod the arithmetic mean of the three values, 
that is, 80.2. He cannot assert with certainty that the actual length is 80.2 
(not even if this figure is understood as an abbreviated expression for the 
interval 80.15-80.25). The value 80.2 is merely an estimate; that means, 
it is a guess; not an arbitrary guess but a reasonable guess. It is indeed 
the best guess the observer can make in the present situation, as long as 
no results of further measurements are available to him. Now let us com- 
pare the following two sentences which occur in this example; the first 
belongs, not to our language systems £, but to the more comprehensive, 
quantitative language of physics: 

(1) ‘The actual length of the rod is 80.2.’ 

(2) ‘The estimate of the length of the rod with respect to the given evi- 

dence e is 80.2.” 


The sentence (1) is an empirical sentence; it has factual content. (We 
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need not discuss in detail the problem of its exact interpretation in terms 
of observations; it may be interpreted, for example, as saying that the 
arithmetic mean of the results of the first » measurements would, with 
increasing n, converge toward 80.2.) The second sentence, on the other 
hand, is analytic. It is based upon a definition of the concept of estimate. 
(This definition may be similar to, but more complicated than, the one 
indicated in § 41D (3) because of the occurrence of a magnitude with a 
continuous scale of values.) Let us assume that this definition is con- 
structed in such a manner that it implies that, for simple cases like the 
one under discussion, the estimate is the mean of the observed values. 
The sentence (2) cannot be either confirmed or disconfirmed by any future 
observations. Even if the results of future measurements tend toward a 
value considerably different from 80.2, it still remains true that 80.2 is 
the estimate with respect to the evidence e containing the three values stated 
earlier. 

Let us suppose that X has to make a practical decision concerning the 
use of the given rod, a decision which depends upon the length of the rod. 
Then he may act in certain respects as though he knew that the length 
was 80.2. Now let us analyze the theoretical basis of this behavior. This 
is not meant as a psychological question concerning the actual process by 
which X arrives at his decision, but rather as a rational reconstruction 
of this process. How does X utilize the sentences (x) and (2)? We might 
perhaps be tempted to say that he must make use of (1) rather than (2), 
because only the sentence (1) can tell him what the actual length is. 
X would certainly make use of (1) if this sentence were known to him. 
However, in the situation assumed in our example, X does not know the 
actual length but only the results of the three measurements. Sentence (1) 
is at the present moment for X neither certain nor even probable, that is 
to say, it does not follow from the observational results expressed by e 
and is not even highly confirmed by e. [Under certain plausible assump- 
tions concerning a concept of degree of confirmation ¢ as an explicatum 
for probability, it can be shown that, for the hypothesis that the actual 
length is exactly 80.2, ¢ on e is o; and for the hypothesis that the actual 
length is between 80.15 and 80.25, ¢ on € is considerably less than 1/2.] 
With respect to sentence (1), X can do nothing else but wait and see in 
which direction future observations will point; they may highly confirm 
it and hence suggest its acceptance or highly disconfirm it and hence sug- 
gest its rejection. Therefore X cannot find a theoretical basis for his de- 
cision in sentence (1). But he finds it in sentence (2), because this sen- 
tence is analytic and hence both true and kndwn to him; and, added to 


250 IV. THE PROBLEM OF INDUCTIVE LOGIC 


his evidence ¢ containing the results of the three measurements, it states 
the estimated value 80.2 which determines his decision. 

Generally speaking, situations of this kind may be characterized as 
follows. Practical decisions of a man are often dependent upon values of 
certain magnitudes for the things in his environment. If he does not know 
the exact value, he has to base his decision on an estimate. This estimate 
is given in a statement of the form: ‘The estimate for the magnitude in 
question with respect to such and such observational results is so and so.’ 
This statement is purely analytic. Nevertheless it may serve as a basis for 
the decision. It cannot, of course, do so by itself, since it has no factual 
content; but it may do so in combination with the observational results 
to which it refers. 

Now let us return to the problem of the concept of probability,. The 
situation here is to some extent analogous to that in the example just dis- 
cussed. Suppose that X has taken a sample of eighty persons from the 
population of Chicago and has found that sixty of these persons possess 
„a property M. This constitutes his present evidence e. Let 4 be a singular 
hypothesis, namely, the prediction that the person b taken at random 
from the nonobserved part of the population will be found to have the 
property M. For the present discussion the exact value of the prob- 
ability, of 4 on e does not matter. It seems plausible that this value does 
not differ much, if at all, from the relative frequency of M in the observed 
sample, which is 3/4. To make the example more concrete, let us arbi- 
trarily assume that the probability, of » on eis 0.73- [The reason for choos- 
ing here a value not equal to but slightly different from the observed rela- 
tive frequency is merely the intention of stressing the fact that the esti- 
mate to be discussed is equal to the value of the probability,, here 0.73, 
and not necessarily equal to the observed relative frequency, here 3/4.] 
Now let us compare the following sentences concerning the present ex- 
ample; we shall see that they are analogous to the earlier sentences con- 
cerning the actual length of a rod and the estimate of its length. 

(3) ‘The actual relative frequency of M in the population of Chicago 

is 0.73. 
(4) ‘The probability, of the singular hypothesis % with respect to the 
evidence e concerning the observed sample is 0.73.’ 
According to our earlier discussion (§ 41D(8)), the estimate (in the sense 
of the probability,-mean) of the relative frequeny of M in the whole 
population of Chicago with respect to the evidence e is equal to the prob- 
ability, of % on e, hence likewise 0.73. Therefore (4) is logically equivalent 
to the following: j 
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(5) ‘The estimate of the relative frequency of M in the whole popula- 
tion with respect to the evidence e is 0.73.’ 

Suppose that X has to make a practical decision, perhaps of an adminis- 
trative or legislative nature, a decision depending upon his knowledge 
concerning the relative frequency of M in the population of Chicago. It 
is clear what he will do; he will act in certain respects as though he knew 
that the relative frequency was 0.73. But it is perhaps not immediately 
clear what the theoretical basis for his action is—in other words, which 
rational procedure would lead to his action. Should he take (3) or (5) as 
a basis for his decision? The proponents of the frequency conception of 
probability will perhaps say that only (3) can serve as a basis because this 
is a statement about the relative frequency in the whole and hence a 
probability statement in their sense. They are right to this extent: if X 
knew (3), he would take it as a basis. However, (3) is not known to X as 
long as his knowledge is restricted to the evidence e concerning the eighty 
observed individuals; (3) is not even highly confirmed on the basis of the 
evidence e. It is rather the other statement that may serve as a basis for 
the decision. This statement is known to X because it is, in either of the 
two equivalent formulations (4) and (5), analytic; (4) follows from the 
presupposed definition of probability,, and (5) from the definition of the 
estimate of a function. The statement (5) is quite analogous to the earlier 
statement (2) concerning the estimate of the length of a rod. Here, again, 
the statement about the estimate cannot be either confirmed or dis- 
confirmed by any future observations. Even if a complete census of the 
population of Chicago showed that the actual relative frequency were 
quite different from 0.73, this would by no means refute the statement 
that the estimate with respect to the evidence e is 0.73. Here, as in the earlier 
case, the decision can be based on the given observational evidence e and 
the analytic statement which gives the estimate with respect to this evi- 
dence e. It is the value of this estimate or, in other words, the value of 
probability, that justifies the decision. 

We obtain the same result if we consider the following situation. Sup- 
pose that X wants to make a bet on the prediction that an arbitrarily 
chosen individual has the property M. This prediction is the hypothesis 
h, to which the statement (4) ascribes the probability, 0.73 with respect 
to the available evidence e. Thus on the basis of this statement (4) X will 
decide to accept no bet on # with a betting quotient higher than 0.73. 
The same decision could also, of course, be based on the statement (5) 
concerning the estimate of the relative frequency. 

These considerations show the following. In a sense it is correct to say 
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that empirical statements concerning the values of physical magnitudes 
are important for determining our practical decisions. This holds, in par- 
ticular, for the relative frequency in the long run of a property M, in 
other words, the probability, of M, because the final balance of the 
totality of future bets on singular predictions concerning M is determined 
by the probability, of M (§ 41C). Thus empirical statements and, in par- 
ticular, statements on probability., may indeed serve as a guide of life. 
However, they can do so only if they are known. But the exact value of 
an empirical magnitude is in general not known; and if the value of a 
magnitude is defined as the limit of an infinite sequence of observed values, 
as is the case, for example, with length as interpreted above and with prob- 
_ ability, as explicated by Mises and Reichenbach, then the exact value 
cannot possibly ever be known. This fact does not make concepts of this 
kind either meaningless or unsuitable for practically useful application. 
But it has the consequence that inductive logic is needed for utilizing these 
concepts. The hypothesis that the actual value of a certain magnitude lies 
within a given small interval may be highly probable, although it is not 
certain; that is to say, it may not follow from the available observational 
knowledge e but its probability, with respect to e may be high. And even 
if this is not the case for any small interval, as in the examples discussed 
above, we may still calculate the estimate of the value of the magnitude 
with respect to e. In these cases the magnitudes remain practically im- 
portant; but they can be utilized only by way either of a high probability, 
or of an estimate, defined with the help of probability,; without the use 
of these concepts of inductive logic those magnitudes would become use- 
less. Thus we see that neither empirical science (which includes prob- 
ability.) nor inductive logic (which is based upon probability.) can serve 
alone as a guide of life but only both in co-operation. Science makes ob- 
servations and constructs theories. Inductive logic is necessary in order to 
obtain judgments concerning the credibility of theories or singular pre- 
dictions on the basis of given observational results. And these judgments 
concerning expected events serve as a basis for our practical decisions. In 
analogy to a well-known dictum of Kant, we might say that inductive 


logic without observations is empty; observations without inductive 
logic are blind. 


§ 50. The Problem of a Rule for Determining Decisions 
A. Our problem is to find a rule which tells a man X, with the help of induc- 


tive logic, which decisions it would be reasonable for him to make in view of his 
past experiences. Such a rule does not belong to inductive logic itself but in- 


aa 
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volves the methodology of induction and of psychology. In this section four 
tentative forms of the rule are discussed, each more adequate than the pre- 
ceding ones. The final rule will be explained in the next section. B. Rule Rr: 
‘Act on the expectation that events with a high probability; will happen’. 
C. Rule Ra: ‘Among several possibilities, act on the expectation of the one with 
the highest probability,.’ D. Rule R,: ‘If your decision depends upon a mag- 
nitude whose value u is unknown, determine its estimate w’ on the available 
evidence and then act in certain respects as though you knew with certainty 


that u were equal or near to w’.” E. Rule R,: ‘Choose that action for which the 


estimate of the resulting gain has its maximum.’ From this is derived a special- 
ized rule Rf: ‘If an offer is favorable (ie., the estimated gain in case of ac- 
cepting is greater than in case of rejecting), accept it; if it is unfavorable, reject 
it” Even this apparently obvious rule leads in certain exceptional cases to un- 
reasonable decisions and hence is in need of further modification. 


A. The Problem 


The discussions in the preceding section have thrown some light on the 
question as to how considerations of probability, influence expectations 
of future events and thereby practical decisions. We shall now investigate 
this question in greater detail. We presuppose that the observer X is in 
possession of a system of inductive logic as a theory of probability:. This 
theory applies to the sentences of X’s language (which may be more com- 
prehensive than our systems £), in which X can formulate the results of 
his observations and his predictions of future events. X formulates the re- 
sults of all observations which he has made up to the present time in 
one comprehensive report e. We assume that he is able to calculate the 
value of probability; on the evidence e for any hypothesis / in which he 
is interested. We disregard here the question of how X calculates these 
values; we are at present interested only in the question of how he utilizes 
them. In other words, we wish to formulate a rule which tells X how he is 
to make his decisions with the help of the values of probability, if he 
wants his decisions to be rational. For X to act rationally means to learn 
from experience and hence to take as evidence what he has observed. It 
means further that he should avoid considering only a biased selection 
from his experiences and disregarding any available information that 
might be relevant; therefore, we assume that he takes as basis the total 
evidence e available. 

The problem now to be investigated concerning the determination of 
decisions with the help of probability: goes beyond the boundaries of in- 
ductive logic itself. Inductive logic has only the task of finding statements 
concerning probabilitys; these statements may give the values of prob- 
ability, for particular cases or state general properties or relations of such 
values. Inductive logic itself is not concerned with the practical applica- 
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tions of its theorems, any more than pure arithmetic is concerned with the 
application of arithmetical theorems for the purposes of planning a family 
budget, or pure geometry is concerned with the application of geometrical 
theorems for the purposes of navigation. In the later construction of a 
system of inductive logic we shall not deal with the problems of applica- 
tion. But in the present preliminary discussions it seems advisable to do 
so. While nobody doubts the theoretical validity and the practical appli- 
cability of arithmetic and geometry, the same does not hold for inductive 
logic; not only its usefulness but even its theoretical possibility is still 
controversial. Therefore, a clarification at least of the general features 
of an application of inductive logic for practical purposes may be helpful 
in contributing to a clarification of its nature and purpose. The distinc- 
tion between the system of pure inductive logic and the procedures and 
rules of its application for practical decisions is emphasized chiefly for 
the following reason. The analysis of the application involves, as we shall 
soon see, in addition to considerations of the general methodology of in- 
duction (§ 44A) also certain assumptions and concepts of a psychological 
nature (for instance, concerning the measurement of preference and 
valuation). Now it is important to see clearly that the problems and diff- 
culties here involved belong to the methodology of a special branch of em- 
Pirical science, the psychology of valuations as a part of the theory of 
human behavior, and that therefore they should not be regarded as diffi- 
culties of inductive logic. 

The following discussion will lead, step for step, from customary crude 
formulations of a rule for the determination of practical decisions with 
the help of inductive logic to more adequate formulations. Four versions 


of the rule will be discussed in this section, the fifth and final one in the 
next section. 


B. The Rule of High Probability 


Many writers on the calculus of probability and its application have 
declared that it is reasonable to expect that those events will happen which 
are highly probable. This suggests the following rule directed toward X 
and referring to the total evidence e available to X. 


Rule R.. Assume that those events will occur which have a high value 


of probability, on evidence e, and act as though you knew that these 
events were certain. 


This is a crude rule-of-thumb which is often useful. As we shall see, 
however, it would in many cases lead to a wrong decision, that is, one 
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which would not be regarded as reasonable by sensible people. Further- 
more, it has the disadvantage of being applicable only if one of the pos- 
sible cases has a high probability,. 


C. The Rule of Maximum Probability 


In order to avoid the disadvantage just mentioned of rule R,, some 
writers have said that the most probable among the possible events should 
be expected, even if its probability is not high. This suggests the follow- 
ing rule. 


Rule R,. With respect to an exhaustive set of mutually exclusive events 
(that is, in semantical terms, a set of hypotheses which are L-dis- 
junct and L-exclusive in pairs with respect to e) expect that event 
which has the highest probability,, and act as though you knew that 
this event is certain. 


This rule works satisfactorily in a case of the following kind. 

Example of the bookshop. X has a bookshop and wants to order copies 
of a certain book that is in steady use in order to have them on hand for 
the beginning of the academic year. He has experience with the past sale 
of this book over a number of years. On the basis of this and perhaps other 
relevant information, he finds that the assumption of a sale of 80 copies 
has a probability which, although small, is higher than that of any other 
case; the probability for the number 79 is somewhat less, for 78 still less, 
and so it goes down for smaller numbers, first slowly, then steeply; simi- 
larly, the probability decreases for higher numbers, first slowly, then more 
steeply, in such a way that the curve showing the probability as a func- 
tion of the number of copies has a bell-shaped form, which has its maxi- 
mum for 80 and declines symmetrically from this maximum toward both 
sides. If X follows the rule R,, he assumes that there will be a demand of 
80 copies, and therefore he provides for this number. This decision would 
not be unreasonable (although, as we shall see later, a slightly different 
decision might be still better). This example shows also that the rule R, is 
better than R,; the latter rule is not applicable to the cases of the various 
numbers of copies, because none of them has a high probability. 

In other cases, however, rule R, does not work so well. This is seen by 
the following example’ which at first might appear as quite analogous to 
the one just given. é 

Example of the restaurant. X runs a dining place and decides how much 
of every dish is to be prepared today. He knows from previous experience 
concerning one particular dish that the number of people who order it on 
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one day varies between o and 5; and, in particular, the probability that 
the number of people who will order it today will be o, 1, 2, . . . , 6, is 0.20, 
0.19, 0.18, 0.17, 0.16, 0.10, o, respectively. The dish must be prepared in 
advance. Thus the problem for X is: for how many people should he have 
it prepared? Rule R, is again inapplicable, since none of the cases has a 
high probability. What would be the effect of rule R.? The most probable 
assumption is that nobody will order this dish. If X follows rule R,, he 
will act on this assumption and not prepare this dish. But this decision 
does not seem to be the best in view of the fact that the assumption that 
nobody will order the dish has only the probability 1/5; thus, it is prob- 
able to the degree 4/5 that at least one person will order it. 

In each of the following two examples only two possible cases need be 
considered, one of which has a very high probability. Thus in these ex- 
amples, both rules R, and R, are applicable. Both these rules advise X to 
act on the assumption of the event which has the high probability; but 
this advice is wrong in both examples. 

Example of the lottery. The lottery consists of one hundred tickets; it is 
known (that is, it follows from e) that exactly one ticket will win; the 
prize is $100; the information concerning the lottery mechanism is such 
that all hundred tickets have an equal chance of winning. X has one ticket. 

; Thus the probability of his winning is 0.01, that of his not winning is 0.99. 
If now X were to take either rule R, or R, literally, he would act as if he 
knew for certain that his ticket will not win. This would lead, for example, 
to the unreasonable decision of selling his ticket to somebody who offers 
ro cents for it. 

Example of the fire insurance. X owns a house whose value is $10,000. 
His knowledge e contains statistical information concerning a large num- 
ber of houses which were under similar conditions and of which a certain 
fraction burned down during a certain period. X finds that with respect 
to this information the probability of the assumption h that his house will 
burn down during the next year is 0.001; thus the probability of ~+ is 
0.999. Should X take out fire insurance if the premium for one year were 
$5.00? This would be a very cheap insurance, and it would certainly be 
advisable to take it. However, if X were again to follow either rule R, or Ra 
literally, he would act as though he knew with certainty that his house 
was not going to burn down within the next year and therefore he would 
decide against insurance. 

Thus we have found that rule R,, though better than R,, nevertheless 


leads to wrong decisions in certain situations. Therefore, we have to look 
for a better rule. 
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D. The Rule of the Use of Estimates 


Let us examine, in the example of the lottery, why rule R, went wrong 
and how it should be changed. It is clear that $1.00 would be a fair 
price for a ticket, because, if all tickets are sold at this price, the man who 
arranges the lottery comes out even, and so do the buyers of tickets taken 
together. Therefore X should not buy a ticket for more than $1.00 nor 
sell it for less (with a certain qualification to be explained later). This 
shows that the amount which should determine X’s decision is neither 
the most probable gain (as rules R, and R, would have it) because this is 
zero, nor the one possible positive gain, which is $roo, but rather the 
estimate of his gain with respect to the available evidence e. (Here again 
we use the term ‘estimate’ in the sense of ‘probability,-mean estimate’ as 
defined by (3) in § 41D.) This estimate is $1.00, as is easily seen from the 
definition mentioned. Thereforé X, as a rational agent, will regard $1.00 
as the money value of his ticket. We assume that X is able to calculate 
not only the values of probability, on evidence e but also, on their basis, 
the values of estimates on e. 

These considerations suggest the following rule: 


Rule R,. Suppose that your decision depends upon a certain magnitude 
u unknown to you, in the sense that, if you knew w, then this would 
determine your decision (that is, there is a function F such that a 
certain feature of your decision would take the value F(u)). Then 
calculate the estimate u’ of u with respect to the available evidence e 
and act in certain respects as though you knew with certainty that 
the value of u were either equal to w’ (that is, let the feature in ques- 
tion take the value F(u’)) or near to w’. è 


Tt is easily seen that this rule is much better than the two previous ones; 
but we shall find that it still has some weak points. In the example of the 
bookshop the estimate of the number of books demanded is equal to the 
number with the highest probability, that is, 80, because of the sym- 
metry of the probability curve (this follows from the definition of the 
probability,-mean estimate). Therefore, rule R; leads, just as Ra, to the 
decision of keeping 80 copies in stock. This decision seems fairly rea- 
sonable. 

In the example of the restaurant the estimate of the number of persons 
ordering the dish is found to be 2.2. Therefore, according to rule R;, X ex- 
pects that two persons will order and hence prepares for two orders. Now 
it becomes clear why the previous rule R, worked well in the case of the 
bookshop but not in that of the restaurant, although the situations seem 
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similar. The reasom-is that in the case of the bookshop, but not in that of 
the restaurant, the estimated value is equal to the most probable value. 
This holds in many cases but by far not in all. Only in those cases where 
it holds is the frequently used formulation R, adequate. 

In the example of the lottery, the estimate of X’s gain is $1.00. There- 
fore, according to rule R,, X is not willing to pay for a ticket more than 
this amount or to sell it for less. In the example of the fire insurance, the 
estimate of X’s loss from fire within the next year is $10.00. Thus rule R, 
leads X to the decision of taking out insurance ifthe premium is not more 
than this amount. 

In all four cases the decisions determined by rule R, seem quite rea- 
sonable. In one case (bookshop) the decision is the same as that de- 
termined by rule R4; in the other three cases the decisions by R, are much 
more reasonable than those by R2. 

However, the same rule R,, if applied without qualification to other 
aspects of the four examples, would lead to quite unreasonable decisions. 
This is the reason for the qualifying phrase ‘in certain respects’ in the 
formulation of the rule. The weakness of the rule is the vagueness of this 
phrase; the rule does not specify in which respects X may act as if he 
knew that the estimate were the actual value and in which respects he 
may not. That there are certain respects in which the described way of 
acting would not be reasonable is easily seen as follows. If, in the example 
of the bookshop, X were to act in every respect as though he knew with 
certainty that exactly 80 copies will be demanded, then he would be will- 
ing to bet a thousand against one on the prediction that the number of 
copies demanded will be exactly 80—obviously an unreasonable decision. 
For this reason the rule was formulated in such a manner as to admit the 
weaker expectation that the actual value is, if not equal, then near to the 
estimate. But if X acted in every respect as though he knew that the num- 
ber of copies demanded will be between, say, 60 and 100, then he would 
be willing to bet one thousand against one on this prediction, which would 
again be unreasonable. We might perhaps consider modifying the rule 
somehow to the effect of advising X not to regard as certain even the 
weaker prediction that the actual value is near the estimate and hence to 
bet upon this prediction only at moderate odds. But a modified rule of this 
kind, although working all right in the case just discussed, would in other 
cases still lead to wrong decisions. In the example of the lottery it would 
make X willing to bet at moderate odds on the prediction that his gain 
will be near to $1.00, say, between 50 cents and $2.00, although X knows 
from the available information e that such an outcome is impossible. 
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Many cases are similar to this example in so far as the estimate is not even 
near to any of the possible values. 

The difficulty which we have discussed consists in the fact that rule R, 
does not specify in which respects it should be applied and in which not. 
But there is another, more serious, difficulty which would remain even 
if we found a way of overcoming the first. Let us assume that we had suc- 
ceeded in making the required specification in an adequate way, although 
it is not easy to see how this could be done in a general way. In particular, 
let us assume that the modified rule were such that X could apply it in 
our examples only in the following respects: in the example of the lottery, 
only for the determination of the price at which he is willing to buy or 
sell a ticket; in the example of the fire insurance, only for the determina- 
tion of the premium he is willing to pay; in the example of the bookshop, 
only for the determination of the number of copies to be provided; in the 
example of the restaurant, only for the determination of the number of 
servings to be prepared. Even then the decisions determined by the rule 
in these and similar cases are not always the best that could be taken in 
the situation in question. If a bookseller estimates the number of copies 
that will be demanded at 80, he will actually order not this number, but 
a somewhat larger number. For if he has fewer books than will be de- 
manded, he misses a profitable business, while if he has more books, he 
incurs merely the minor disadvantage of having to store the unsold copies 
for a later occasion or to return them to the publisher. 

Let us try to describe the essential features of this situation in general 
terms. X’s decision depends upon an unknown value u. Suppose he 
chooses, no matter by what means, rational or irrational, a value «” and 
acts so as to be prepared for this value. If then the actual value of « hap- 
pens to be w’, X is properly prepared and hence in a favorable situation. 
If, however, the actual value u turns out to be either higher or lower than 
uw’, the case is unfavorable for X. If it is a financial matter, for instance, a 
business affair or a game or a bet, X suffers a loss in this case. Now the 
decisive point is that in certain situations the losses to be expected are not 
symmetrically distributed but are higher on one side. If X has to expect a 
higher loss in case he is underprepared (i.e., w” < u) than in case he is 
overprepared (i.e., w” > u), then he should guard more against under- 
preparedness than against overpreparedness. This means that he should 
choose as the value u” for which he prepares not the estimate w’ of v, but 
a somewhat higher value in order to make the unfavorable result of un- 
derpreparedness less probable. In his choice of the value u” he must take 
into consideration not only the possible values of v but also, and essen- 
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tially, his gains (including losses as negative gains) in all the possible 
cases and their probabilities. The choice of a particular decision is ulti- 
mately to be determined by the estimates of his gains for the various 
possible decisions rather than by estimates of the other magnitudes in- 
volved. 


E. The Rule of Maximizing the Estimated Gain 


The preceding considerations suggest a new rule involving only esti- 
mates of one magnitude, the gain of X, and saying roughly that X should 
choose that course of action for which the estimate of his gain has the high- 

est possible value. We consider at the present moment only gains or 
losses of money or of such other things as can be bought for money, for 
example, a book, a meal, a concert, the advice of a lawyer, a trip to the 
mountains. The problem of the so-called imponderables, that is, advan- 
tages that cannot be bought and disadvantages that cannot be bought off, 
will be discussed in the next section, because the solution of this problem 
is closely connected with the concept of utility involved in the next rule. 
There is a set of actions which are possible for X at the present moment 
out of which he has to choose one. Let these possible actions be described 
by the sentences 7;, fa, ...,j:,.... Let the possible events which might 
result from any of these actions, together with other factors in the situa- 
tion not influenced by X, be described by the sentences fy, h2,..- , Me; 
.... If X carries out a particular action 7;, then some of these events 
may become impossible; others, which remain possible, may change their 
probabilities. [For the sake of simplicity we assume in the present informal 
discussion that both the number of possible actions and the number of 
possible resulting events are finite. The analysis for infinite sets of possi- 
bilities would merely be somewhat more complicated mathematically, 
but the basic features would remain the same. Note that the j-sentences 
are L-exclusive in pairs and L-disjunct with respect to e; and the same 
holds for the h-sentences.] We assume that X is able to assess in terms of 
money units the value of his wealth in any possible situation. Let us call 
this value his fortune in the situation in question. Let f, be the fortune of 
X at the present moment, and f;, the fortune he would have in case he 
carried out the action j; and the event k+ occurred. By his gain g., in this 
case we mean the increase of his fortune in consequence of his action ji 
and the event hy; hence giz = fix — fo. A loss is here taken as a negative 
gain. Suppose X considers one of the possible actions, say, j; at the pres- 
ent moment, before he actually chooses and carries out any of the actions. 
He does not know what will be the actual gain g:in case of his action j;, be- 
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cause this gain depends also upon the unknown /-events. Nevertheless, 
X can make an estimation of this gain. He is able to calculate the prob- 
ability which any of the possible events, say, /x, would have if he were to 
carry out the considered action j; that is, the probability, of 4 on the 
evidence e. ji; let the value which he finds for this probability be qi. 
With the help of the probabilities gir, gia) - - - , gir, « » for the events 
hu, ha, .» - , hr . - +, he can now calculate the estimate g; of the gain g; in 
case of the action 7; According to our definition of an estimate (in the 


sense of the probability,-mean, § 41D(3)), gi = Dien X qir). In this way 


X can calculate for each of the possible actions the estimate of his gain 
resulting from this action. Then the reasonable thing for him to do is to 
decide upon that action for which this estimate has its maximum. Thus 
the general rule must say in effect: ‘ Maximize your estimated gain!’ It may 
be formulated as follows: 

Rule R,. Among the possible actions choose that one for which the esti- 
mate of your gain, determined with the help of the probabilities of 
the possible outcomes, is not lower than for any other possible ac- 
tion. If several actions lead to the maximum value of the estimate, 
you may choose any one of them, it does not matter which one. 


This rule is essentially better than R,. It eliminates both difficulties 
which we discussed in connection with R,. The first difficulty resulted 
from the fact that rule R, advised X to act in certain respects as if he 
knew that the actual value was equal to the estimate. To act as if one 
knew what in fact one does not know is a risky procedure. Rule R, does 
not contain any such as-if clause; the procedure prescribed does not in- 
volve any pretension to knowledge not actually available. The second 
difficulty consisted in the fact that in certain cases the reasonable action 
conforms, not to the estimate itself of a certain magnitude, but to a value 
differing slightly from the estimate in one direction, provided the ex- 
pected losses are less in this direction. In cases of this kind, rule R,, in 
distinction to R;, leads to the action for which the least loss is to be ex- 
pected. (For instance, a closer examination would easily show that in the 
example of the bookshop rule R, would lead X to the decision of ordering 
a certain number of books somewhat greater than eighty, if certain plau- 
osses in the various cases are made.) 

In certain simple cases rule R, leads to the same decision as R;; for 
instance, if the decision concerns the acceptance of a bet or the buying or 
selling of a lottery ticket. In many other cases rule R; leads to a decision 
which is, if not the best, at least near to the best decision. Therefore, rule 


sible assumptions concerning the 1 
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R, need not be entirely discarded; it may be regarded as a cruder form 
whose use is often convenient because of its greater simplicity. Thus, al- 
though the more refined rule R, applies the procedure of estimation to 
values of only one magnitude, the gain resulting for a person, nevertheless 
the estimates of many other magnitudes are still useful under certain con- 
ditions. This holds especially for the estimates of absolute and relative 
frequency. 

The situation in which X finds himself is often of such a kind that he 
has to choose between two alternatives only. For instance, he may either 
do a certain thing or refrain from doing it. For example, somebody offers 
X a bet or a business deal under specified conditions which X is not al- 
lowed to change; he has merely the choice of either accepting or declining 
the offer. Let the two actions be described by j: and j+. Let the fortune 
of X which would actually result in the case of the action jı be fı. fı is 
unknown. Let the estimate of f, with respect to e.j: be f;. Then the 
gain g: in this case is fs — fo, and its estimate g; is fi — fo. Let fa, fh, Ea, 
and g, be the analogous values for the action j+. If fj > f, (and hence 
gi > gs), we call the first action favorable for X and the second unfavor- 
able. If f; = f} (and hence gi = gi), we call both actions neutral for X; 
the deal or game or bet offered will likewise be called neutral in this case. 
In other words, an action is favorable or unfavorable or neutral if the 
difference between the estimates of fortune (or of gain) is positive, nega- 
tive, or zero, respectively. (Sometimes the situation is such that in the 
case of one of the two decisions, the fortune of X is expected to remain 
unchanged; in other words, the estimate of the gain is zero. In this case 
the other decision is favorable, unfavorable, or neutral if the estimate of 
gain for this decision is positive, negative, or zero, respectively.) 

If rule R, is applied to the case of an offer made to X in the form of an 
alternative, it leads to the following specialized rule: 


Rule R*: If the offer is favorable for you, accept it; if it is unfavorable, 
reject it; if it is neutral, you may accept or reject it. 


This special rule seems in accord with common sense. It may even ap- 
pear as too obvious and trivial to deserve explicit statement. This appear- 
ance, however, is deceptive. Rule R, and likewise the specialized form R4 
lead indeed to reasonable decisions in the great majority of cases. But 
there are certain cases where the resulting decisions are not the best ones. 
We shall now consider such exceptional cases; their examination will lead 
to a further refinement of the rule. 

Example of the bet on a coin. The present fortune of X is 10,000 (with 
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the dollar as value unit), Somebody offers X a bet on the outcome of a 
throw of a coin. The coin is known to both bettors to be symmetrical; 
hence the probability of either result is 1/2. (i) Suppose the bet is offered 
at even odds; then it is neutral for X, and hence rule Rj permits him to 
accept. (ii) Suppose that X’s stake is smaller than the other; then the bet 
is favorable for X and hence the rule commands acceptance. The rule 
determines these decisions irrespective of the absolute amount of X’s 
stake in the bet. Suppose, however, that X’s stake is 8,000 and his part- 
ner’s stake either (i) 8,000.or (ii) 8,oor; then all sensible people would 
regard X’s acceptance of the bet as very unreasonable. Some would tell 
him that under no condition should a reasonable man risk a considerable 
part of his fortune on the flip of a coin. Others might perhaps be less 
severe; they would permit such a risk if the offer were extremely favor- 
able, say, at odds of eight thousand to a million. 

How should we then modify rule Ri? Should we make it more re- 
strictive in such a manner that it requires for acceptance not only that 
the deal be favorable but that it be favorable to a sufficient degree de- 
pendent upon the ratio between X’s stake and his fortune? But a modifi- 
cation of this kind would not do. We shall see that there are other cases 
which suggest, not a restriction, but a liberalizing of the rule; cases in 
which it is reasonable to accept an offer although it is unfavorable. 

A simple case of this kind is provided by the example of the fire insur- 
ance. Suppose that X’s present possessions consist of a house valued at 
10,000 and roo in cash; hence his present fortune is f = 10,100. He has 
to choose between two actions: jz consists in taking out fire insurance for 
one year for his house at the full value of 10,000, for which he pays a pre- 
mium in the amount of 7; ja consists in not taking insurance. There are 
two possible events relevant for the outcome; /,: the house will burn 
down during the year of insurance, and /,, which is ~#,: the house will 
not burn down. According to our previous assumption for this example, 
X’s knowledge e contains information of previous experiences concerning 
similar houses of such a kind that the probability of /, with respect to e 
is o.oor. Let us assume that insuring or not-insuring does not influence 
the chance of a conflagration; this means that the probability of /, is 
likewise 0.001 with respect to ¢. jz and to e . ja. Then on each of the evi- 
dences e, e. jr, and e. ja, the probability of h, is 0.999. We assume for 
the sake of simplicity that X has no gains or losses during the year except 
those connected with the insurance and a possible conflagration. Let us 
first assume that X takes out the insurance. Hence he pays the premium 7. 
If now the house burns down (Jz), he has a loss of 10,000 but is reim- 
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bursed for it; thus his gain is gı: = —r. If the house does not burn down, 
his gain g,, is likewise —r. Thus, in the case of insurance (j,), his gain g, is 

_certainly —r, irrespective of the probability of.a conflagration; hence the 
estimate g; is here, in a trivial way, =g, = —r. Now suppose that X 
does not insure (j+). If then the house burns down (/;), his gain is g..= 
— 10,000; the probability for this is #2, = o.oo1. If the house does not 
burn down (h,), his gain is ga- = o; hence the probability for this case is 
irrelevant. Thus the estimate of gain in the case of noninsurance (j) is 
gi = (— 10,000) X o.0or = —10. Therefore insurance is favorable, un- 
favorable, or neutral for X, if the premium r is <10, >10, or = 10, re- 
spectively. Suppose that the insurance company has the same informa- 
tion as X concerning the statistics of past conflagrations. Then it will 
certainly demand a premium r > 10, because the premiums received will 
not only have to balance the payments for damage by fire but must cover 
also the administrative expenses and maybe yield a profit. Let us there- 
fore assume that the premium is 12. Then the insurance is unfavorable for 
X, and hence rule R} would prohibit it. On the other hand, to take out 
insurance under the circumstances described would be regarded by every- 
body as reasonable, and not to do it would be regarded by most as un- 
reasonable. 

We have found that rule Rt, demanding the acceptance of favorable 
offers and the rejection of unfavorable ones, works satisfactorily in most 
cases but not in certain exceptional cases. There are cases in which it 
would be reasonable to reject a favorable offer and other cases in which 
it would be reasonable to accept an unfavorable offer. Thus a further re- 
finement of rule Rj and thereby of rule R,, from which Rý was derived, 
seems necessary. 

Such a refined version of the rule will be developed in the next section. 


§ 51. The Rule of Maximizing the Estimated Utility 


A. The decisive factor for X’s choice of an action is not the physical gain, 
i.e., the monetary value of the goods acquired, but rather the moral gain 
or utility, i.e., the measure of the satisfaction derived by X from the 

goods. Therefore, the last of the rules for determining decisions discussed in the 
preceding section (R,) must be replaced by the following rule Rs: ‘Choose that 
action for which the estimate of the resulting utility has its maximum’, The 
use of this rule presupposes that utility can be measured and that there is a 
quantitative law stating the utility as a function of the gain. 

B. Daniel Bernoulli has stated two laws which are relevant here, a general 
law in comparative terms and a more specific law in quantitative terms. The 
first says that the utility of a fixed physical gain added to an initial fortune is 
the smaller the larger the initial fortune (1); the second says that it is inversely 
proportional to the initial fortune (2). These are psychological hypotheses. 
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C. If we assume these laws, or at least the first, we obtain the following re- 
sults. Even a fair bet or game of chance is morally unfavorable for both part- 
ners, that is to say, the estimate of the utility is negative. Further, it is morally 
favorable to take out fire insurance even at a premium somewhat higher than 
the fair premium (i.e., the estimate of loss by fire). Thus the new rule R; leads 
to reasonable decisions even in those exceptional cases where rule R, did not. 


A. The Rule of Maximizing the Estimated Utility 


In the preceding section we have examined the rule R,, which prescribes 
that action for which the estimate of gain is a maximum, and the special 
rule Ri, which says that a favorable offer must be accepted and an un- 
favorable rejected. These rules lead to reasonable decisions in most cases 
but not in all. We found, in particular, two examples of exceptional cases. 
(1) The offer of a bet on heads at 8,000 against 8,001 is favorable; however, 
if X’s fortune is 10,000, it would not be reasonable for him to accept it. 
(2) The offer of fire insurance at a premium of 12 is unfavorable under the 
conditions described; however, it would be reasonable for X to insure. 

These two cases are alike in the following respect. There is a possi- 
bility of a loss for X which is not small in relation to his fortune; there- 
fore, as a cautious man, he ought to choose the decision which avoids the 
large loss, although this decision is slightly unfavorable for him. This 
might suggest a restriction of rule R} to the effect that X should choose 
a favorable action only if none of the losses which are possible on the basis 
of this decision is large in relation to his fortune. And it has indeed often 
been said that in the case of a bet the probability may be regarded as 
representing a betting quotient for a fair bet, neither favorable nor un- 
favorable for either side, only if the stake of each partner is small in rela- 
tion to his fortune. Restricting the rule in this way seems well in accord 
with common conceptions of reasonable decisions. However, this pro- 
cedure would merely limit the field of application of the old rule. The new 
rule would not tell us what to do in the excluded cases, those involving 
the possibility of large losses. Our problem is to state a general rule appli- 
cable in any case no matter whether the risks involved are small or large. 
It would not do to stipulate that large risks must be avoided at any price. 
There are certainly situations in which each of the possible decisions in- 
volves a large risk, And even in a situation where one of two possible de- 
cisions involves a large risk while the other does not, it may be advisable 
not to take the latter decision if the price is too high. In the example of the 
fire insurance it seems reasonable for X to insure even if the premium is 
more than 1o and hence the insurance is unfavorable, provided it is not 
too unfavorable. If the only opportunity of fire insurance for X involves 
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a premium of 300, it seems questionable whether it would not be wiser for 
X to leave the house uninsured. What is needed is a general rule that says 
in a case like this exactly where the boundary line of the too unfavorable 
decision is. 

A way to a solution might be found if we could answer the question why 
it is that those possible cases which involve a large loss for X ought to be 
given special consideration; in other words, why X, in choosing his de- 
cision, ought to assign to such a case not only a weight proportional to 
the amount of the loss involved—as is done by rule R,—but a still higher 
weight. The answer is: the weight of a large loss should be more than 
proportional because X would suffer from a large loss disproportionately. 
If X has a fortune of 10,000, then he would suffer from a loss of 8,000 not 
only eight times as much as from a loss of 1,000, but much more because 
it would mean his near ruin. If X were to lose by ten successive accidents 
1,000 each time, then every loss would hurt him more than the preceding 
ones, and the last would be the worst. Inversely, if X, with an initial for- 
tune of zero, were to make ten or any other number of successive gains of 
1,000 each, the satisfaction derived from the first gain would be the great- 
est, and those derived from the subsequent gains would be smaller and 
smaller, 

Following the terminology of economists, we shall call the capacity 
of a certain amount of money or goods for satisfying the needs of a cer- 
tain person the utility of that amount for this person. [Other terms used 
for this concept are ‘moral gain’ (Laplace) and ‘subjective value’.] It seems 
that the following law holds generally within a wide field. 


(t) Law of diminishing marginal utility. If a certain gain (a certain 
amount of goods or money) is added to an initial fortune fo, then the 
utility of this gain is the smaller, the higher f,. 


This is, of course, not a law of inductive logic but an empirical law 
concerning the reactions of human beings, hence a law of psychology; but 
it is of importance for the application of inductive logic in determining 
practical decisions. This law was first pronounced by Daniel Bernoulli. 
It is well known in economics. 

The aim of X in all his actions is the satisfaction of his needs and the 
avoidance of suffering, which we may regard as negative satisfaction. Gains 
in money or goods are appreciated as means of obtaining satisfaction; 
thus what counts is their utility. Therefore X’s decisions must be guided 
by the principle of maximizing the utility of his gains rather than the gains 
themselves. Since, however, he cannot foresee future events, gains, and 


§ 51. THE RULE OF THE ESTIMATED UTILITY 267 


utilities with certainty but only with probability, he must apply the 
maximizing principle to the estimate of utility rather than to the unknown 
utility itself. This, however, presupposes that certain problems are solved 
which involve serious difficulties: first, utility must be measurable, and, 
further, a law must be known determining the utility of gains. 

The first problem is to find a method for measuring the (positive or nega- 
tive) utility of a gain (or a loss as a negative gain) for a certain person at a 
certain time; the (positive or negative) gain may consist in the acquisi- 
tion (or loss) of money, goods, or other advantages. In other words, a 
quantitative explicatum must be found for the inexact concept of utility 
as an explicandum, which is perhaps not quantitative but merely com- 
parative. The basic problem consists in measuring the utility of money. If 
this is possible, then it might be possible to measure the utility of other 
goods and advantages (or disadvantages) by establishing utility equiva- 
lences between them and amounts of money. This seems possible at least 
for those goods which can be exchanged, bought, and sold. But it might 
not be impossible even for the so-called imponderables, for example, a 
disease or the recovery from it, the positive or negative prestige gained 
by composing a good or a bad symphony, the gaining or losing of the love 
of a woman. It may be possible, at least theoretically, to determine the 
utility of events of this kind for X by determining his preferential reac- 
tions. Even if neither X nor the medical authorities accessible to him 
know how to cure a certain disease (which he has, or if he had it), never- 
theless he can imagine a fairy confronting him with the alternative of 
either curing the disease or giving him a certain amount of money. Al- 
though the situation is imaginary, X can ask himself what he would pre- 
fer, and his answer measures his actual valuation. There are amounts of 
money which he will value less than the cure, and perhaps others which 
he will value more; and there will be intermediate amounts with respect 
to which he has no clear preference either way and which thus will repre- 
sent a money equivalent for the utility of the advantage or disadvantage 
in question. It must be admitted that there are some serious problems in- 
volved in this assumption of the possibility of measuring the utility of all 
advantages and disadvantages for a given person at a given time on the 
basis of one-common, one-dimensional scale. But something like this as- 
sumption is usually taken as a basis of an analysis of what is called 
‘rational behavior’ in many parts of social science, especially in economics 
and ethics; and it is indeed hard to see how such an analysis could be — 
made without this assumption. For our present purpose, we need not enter 
into a ctitical examination of the assumption. That belongs to the task of 
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the methodology of the fields mentioned. We presuppose here the general 
methodological assumptions underlying an analysis of rational behavior. 
Our present task is merely to clarify the functions which the inductive 
concepts of probability, and estimate have in determining rational be- 
havior. ? 

The problem of the measurability of utility is much discussed in mathematical 
economics. See, e.g., Ragnar Frisch, New methods of measuring marginal utility 
(Tübingen, 1932); Oscar Lange, “The determinateness of the utility function”, 
Review of Economic Studies, 1 (1933-34), 218-25; Harold T. Davis, The theory of 
econometrics (Bloomington, Ind., 1941), chap. iii; Paul A. Samuelson, Founda- 
tions of economic analysis (Cambridge, Mass., 1947), pp- 90 ff. and 173 ff. John 
von Neumann and Oskar Morgenstern ([Games], pp. 15-31, 617-32) discuss 
the problem of a quantitative concept of utility and construct an axiom sys- 
tem for it. Against those economists who propose to use the concept of utility 
merely in a comparative form (e:g., in the method of indifference curves intro- 
duced by Pareto), they advance the following argument. Let us assume that 
the system of preferences of the person X is complete not only with respect to 
alternative events which, when chosen, occur with certainty but also with 
respect to uncertain events with given numerical probabilities; this means that 
X is able to say, for example, which of the following two alternative events he 
prefers or whether they are equally desirable to him: (1) he receives $1.00 in 
cash, or (2) he receives a lottery ticket which represents a chance of obtaining 
$100 with the probability o.or. The authors show that this complete system of 
the preferences of X determines a quantitative concept of utility for X in all 
its essential features, leaving open only the choice of a zero point and a unit 
of the utility scale. The resulting numerical utility is “that thing for which the 
calculus of mathematical expectations is legitimate” (p. 28). 


Many investigations by economists concerning decisions made by a 
person X (including the discussion of utility by Neumann and Morgen- 
stern just mentioned) are restricted to cases in which X knows the values 
of probability for certain events, especially for anticipated consequences of 
possible actions. The term ‘probability’ is understood in these investiga- 
tions in the sense of probability,, i.e., relative frequency. According to our 
conception, however, the determination of a practical decision can be 
based on the values of probability, ; knowledge of the values of probability. 
is not necessary. Now it is true that, if a value of probability, is known 
to X, that is, contained in the evidence available to X. , then the corre- 
sponding value of probability, with respect to this evidence is equal to 
the value of probability., that is, the known relative frequency. (This fol- 
lows from our considerations in § 41C. It will be shown in more exact 
terms later; see the remarks on To4-1e.) Therefore the numerical values 
obtained for probability or mathematical expectation in those investiga- 
tions can be accepted from the point of view of our theory, because these 
values may be reinterpreted as values of the corresponding inductive 
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concepts. But the approach described has a serious disadvantage: its 
domain of application has very narrow limits. Although the relevant 
values of probability, are known in certain cases, e.g., in many of those 
concerning games of chance, they are unknown in the great majority of 
cases concerning ordinary economic decisions, e.g., buying, selling, in- 
vesting, and the like. Thus the method described excludes most of the 
problems relevant for economics. Now the decisive point is that the limi- 
tation mentioned is entirely unnecessary, if inductive logic is accepted. If 
the investigations in question were to use the concept of probability; in- 
stead of probability., the limitations would disappear, because the values 
of probability, cannot be unknown in the same sense as those of prob- 
ability. (see § 41D, the last paragraph). If X knows the frequency of a 
relevant property M only for a sample which he has observed, then the 
probability,, i.e., the relative frequency of M in the whole population, is 
unknown to him. But he can calculate the probability, of a hypothesis 
which ascribes M to an unobserved individual. This value of probability, 
is simultaneously the estimate of the unknown value of probability, in 
question (§ 41D). This value is sufficient as a basis for X’s decision. 

When a method for measuring utility is found, a law must be established 
which states a functional relation between a gain, either in money or in 
goods having a money equivalent, and the utility of this gain (in other 
terms: between a physical gain and the corresponding moral gain, an ob- 
jective value and the corresponding subjective value). The law of dimin- 
ishing marginal utility is a law of this kind. But, although of great impor- 
tance, it is not sufficient because it states a relation merely in compara- 
tive terms. What is needed is a quantitative law which enables X to de- 
termine beforehand the utility of an expected gain in money or goods, 
This is necessary for him in order to calculate the estimate of the resulting 
utility for each of his possible actions. And this again is required in order 
to enable him to choose the most promising course of action. The problem 
of a quantitative law will soon be discussed further. At the moment let us 
assume that it were solved. Then the maximizing principle could be stated 
in the form of the following rule: 


Rule R,. Among the possible actions choose that one for which the esti- 
mate of the resulting utility is a maximum. 


This rule is analogous to the previous rule R, (§ 50E). The difference is 
merely that R, refers to the utility instead of the money amount of the 
gain. Within those limits where the utility is proportional to the gain, the 
old rule arrives at the same results as the new one. This holds for those 


270 IV. THE PROBLEM OF INDUCTIVE LOGIC 


situations in which the absolute amount of any possible gain of X is small 
in relation to his initial fortune. 

If the fortune changes from fo to f, and hence the gain is g = fı — fo, 
we designate the corresponding utility by g’. We shall also speak of the 
total utilities f, and f, corresponding to the fortunes f, and f,, respectively. 
However, these terms will occur in our calculations merely as auxiliary 
terms; the result will always be expressed, not as a value of the total utility 
itself, but as a difference between two values of total utility, that is, as a 
utility gain. Thus ‘fọ and ‘f; are not interpreted separately, but only a 
term like f: — fo’; the latter is to be understood as the utility, positive 
or negative, which would result for X if his fortune were to change from 


to to fi. ! 


B. Daniel Bernoulli’s Law of Utility 


That the distinction between the amount of money gained and its 
utility value is of great importance in practical applications of probability 
was recognized very early in the development of the theory of probability. 
Daniel Bernoulli, a nephew of the great Jacob Bernoulli, was the first to 
investigate this distinction clearly and systematically in his work [Speci- 
men] published in 1738. He even proposed a particular quantitative law 
connecting the two magnitudes; see (2) below. This law enabled him to 
solve a number of problems, among them the so-called Petersburg Para- 
dox. His theory was later reproduced and further developed by Laplace 
in a chapter of his chief work called “De l'espérance morale” ([Théorie], 
PP. 432-45). In Laplace’s terminology the distinction is made between the 
‘physical fortune’ measured in monetary units and the ‘moral fortune’, 
and hence between the ‘physical gain’ and the ‘moral gain’, that is, the 
satisfaction or utility. The estimate of the physical gain was usually called 
‘mathematical expectation’; Laplace contrasts to it the ‘moral expecta- 
tion’ or ‘moral hope’ (‘espérance morale’), i.e., the probability-mean esti- 
mate of the moral gain. 4 


(2) Daniel Bernoulli's law of marginal utility. Suppose that X’s for- 
tune changes from f, to fo + Af, where the gain Af (positive or 
negative) is small in comparison with fs. Then the marginal utility 
Af (positive or negative) of this change for X is 
(a) proportional to the gain Af, 

(4) inversely proportional to the initial fortune Jer 
Hence: Af = EY, where k is a constant which is characteristic 
of the person X at the time in question. 
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The stipulation (a) seems rather obvious. If X’s fortune is large, say, 
10,000, then he will derive from a positive gain of 2 twice as much satis- 
faction as from one of 1; and he will suffer from a loss of 2 twice as much 
as from one of 1. The decisive point in the law is (b). This is in accord 
with the law of diminishing utility (1) but is more specific. (1) says merely 
that, for the same gain Af, the marginal utility Af decreases with increas- 
ing fo; (2) states quantitatively how it decreases. It says that the utility 
for X of a positive gain of 1 is twice as high if his fortune is 5,000 than if 
it is 10,000. 

The following theorem (3) refers to any change in fortune, whether 
small in relation to the initial fortune or not. It is mathematically deduced 
from (2) by dividing a large change (say, from 10,000 to 11,000) into many 
small changes (say, a thousand additions of 1 each), to which (2) can be 
applied; it is assumed that the utility of the large change is equal to the 
sum of the utilities of the small changes. (Exactly speaking, (3) is deduced 
by integration from the differential form of Bernoulli’s law; see below.) 


(3) Corollary to Daniel Bernoulli’s law. Two changes (positive or nega- 
tive, small or large) in fortune, say, from fe to f, and from f, to fy, 
have equal utilities (positive or negative) if and only if the ratios 
of increase are equal: 


f: — fo = fs — fr if and only if fi/fo = fo/ fa - 


Thus, in order to cause equal increases in utility (the total utility 
growing in an arithmetic progression), the fortune must increase 
by equal ratios, hence in a geometric progression (e.g., 100, 200, 
400, 800, etc.). 

There is a striking analogy between Daniel Bernoulli’s law and the 
Weber-Fechner psychophysical law, which says that the intensity of a 
sensation, e.g., the pressure sensation in the skin, grows by equal amounts 
if the physical intensity of the stimulus, e.g., the physical force of a body 
pressing the skin, grows by equal ratios. The fortune corresponds to the 
stimulus, the (total) utility to the sensation. 


Bernoulli’s law and some consequences of it, which are explained in the text 
in a less technical form, will here be stated briefly in their exact technical form. 
For the sake of simplicity, we have formulated (2) in terms of a small increase 
Af. The actual form of Bernoulli’s law states the same relation for the limiting 
case, that is, it has the differential form 


(4) df =kdf/f. 
Suppose that the fortune changes from fo to fr, and hence the gain i g= 
fx — fo. Then the utility corresponding to this changeisg = fs — fo = k Jj, df/f. 
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Hence: $ 
(5) g = k (log fı — iog fo) = k log fi/fo - 
The corollary (3) follows immediately from this. 
Let X have the fortune fo. Let gr, g2,..., be the possible gains resulting 
from a certain action, and qr, g2,..., their probabilities. Then the utility in 


the case of g: is, according to (5), 9: = Allog (fo + g:) — log fol. g2, etc., are anal- 
ogous. Therefore, the estimate of the utility is (§ 41D(3)) 


(6) g’ = Fg: log (fo + g:) + 92 log (fo + £2)... — (Qi +a...) log fo) 
= kilog [(fo + g:)%(fo + g2)%. . .] — log fo) . 


We shall now determine that gain which, if it occurred on the basis of fs, would 
cause the utility g’ (which does not actually occur but has just been determined 
as an estimate); let us call it “*g’’. According to (5), g’ = &llog (fo + *g') — 
log fo]. This, together with (6), yields: 


(7) *9' = (fo + g:)"(fo + g2)%...— fo. 


This is Daniel Bernoulli's main theorem, from which he draws important con- 
sequences for various problems. We shall illustrate in the text some of his theo- 
rems with the help of our examples; for the sake of easier understanding, we 
shall not make use of (5) or (7) but derive the results by elementary means, 
using only the corollary (3). 

For a given fo, *g’ increases with increasing g’. Therefore our rule Rs, refer- 
ring to the maximum value of g’, could as well refer to that of *g’. 

The content of Daniel Bernoulli’s treatise is summarized by Todhunter 
({History], pp. 213-22). His conception and its consequences are discussed in 
many books on probability; see, e.g., Czuber [Wahrsch.], I, 235-45, Keynes 


* [Probab.], pp. 317 f. (the formula at the top of p. 318, corresponding to our (7), 


is misprinted), Fry [Probab.], pp. 195 f. Bernoulli’s law, chiefly in its com- 


' parative form (r), the law of diminishing marginal utility, has become the foun- 


dation of the modern theory of value in economics, which was founded by 
Stanley Jevons (1871), Carl Menger (father) (1871), and Léon Walras (1874), 
and is based on the concept of marginal utility. 

On the other hand, the quantitative form of Bernoulli’s law is usually re- 
garded by modern authors as an oversimplification. It is pointed out that dif- 
ferent kinds of commodities may require different forms of a quantitative law 
and that the simultaneous consideration of several commodities ought to take 
into account their relationships (Vilfredo Pareto: ‘complementary goods’ and 
‘competitive goods’). Furthermore, doubts have been expressed concerning the 
adequacy of the particular form of the law chosen by Bernoulli, and other forms 
have been proposed. Compare: Ludwig Frick [Einleitung], H. E. Timerding 
[Bernoulli], Ch. Jordan [Bernoulli], Harold T. Davis, The theory of econometrics 
(Bloomington, Ind., 1941). Gerhard Tintner introduces the concept of a risk 
preference functional: with its help he explains economic behavior, e.g., in 
betting or business, as dependent upon the entire probability function rather 
than merely upon its mean or other parameters ([Choice], [Contribution]; fur- 
ther: “The theory of production under non-static conditions”, Journal of Politi- - 
cal Economy, 50 [1942], 645 ff.). Karl Menger (son) [Wertlehre] has made a 
careful analysis of the whole problem. (This analysis leads to a clarification of 
the so-called Petersburg Paradox, which seems more satisfactory than the 
various earlier attempts of a solution.) After a critical examination of the laws 
proposed by Daniel Bernoulli and others he shows that the law, in order to 
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represent the actual behavior of most people, would have to possess the fol- 
lowing features among others, in addition to satisfying the principle of diminish- 
ing marginal utility. The utility of a gain approaches zero as the initial fortune 
grows. There is a certain saturation value which the curve of the total utility 
does not exceed but approaches asymptotically. Further, X does not simply 
try to maximize the probability,-mean, in other words, the effect of an expected 
gain on the decision of X is not measured by the product of its utility and its 
probability; instead, very small probabilities are “underestimated”, that is to 
say, their effect is smaller than the product mentioned and becomes even o for 
sufficiently small, though still positive, probabilities. Probabilities near to x are 
likewise “underestimated”, while certain intermediate probabilities are “over- 
estimated”. Then there is a certain fraction gx, usually <1, such that X, 
possessing the fortune fo, is not willing to risk more than the amount gxfo even 
for the best of chances; gx depends upon the person X and to some extent also 
on the situation, Menger does not propose any particular law either for the 
utility or for the determination of X’s decision. He believes that the form of 
such a law changes from person to person and would therefore have to contain 
many parameters characteristic of person or situation. Although the law would 
have a quantitative form, it could not be used for the actual determination of 
quantitative values with respect to a given person X without first measuring 
the values for X of all the parameters involved. Menger regards as the essential 
features of such a law not so much its quantitative form and the values of the 
parameters involved, but rather certain comparative characteristics, some of 
which are stated by him in a general, comparative form. 


In the case of an alternative, we called one of the two possible actions 
favorable, unfavorable, or neutral, respectively, if g; (the estimate of the 
gain in the case of this action) is greater, less, or equal, respectively, to gi. 
On the basis of the distinction between the gain g and its utility (or sub- 
jective value) g, we should use now the more explicit terms ‘objectively 
favorable’, etc., and contrast them with ‘subjectively favorable’, etc. If 
the estimate of the utility is higher for one action than for the other, the 
first may be called subjectively favorable, the second subjectively un- 
favorable; if the estimates are equal, the actions and the offer are called 
subjectively neutral. 


C. Consequences of Bernoulli's Law 


We shall now explain two important conclusions which Daniel Bernoulli 
has derived from his law. They will be illustrated by application to those 
of our previous examples in which the earlier rule R, (§ 50E) led to un- 
reasonable decisions. Then the application of the new rule R; to these 
cases will be discussed. We shall find that this rule leads to reasonable 
decisions also in these cases; thus it overcomes the difficulties previously 
explained (§ 50E). 

For the more general, comparative, form of the results, we shall use 
only the comparative law of diminishing utility (1). For the more specific 
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quantitative results, we shall assume Bernoulli’s law; it will, however, be 
sufficient to use its corollary (3). We do not mean to adopt the law. To 
decide whether and to what extent the law holds is the task of psychology, 
not of inductive logic; therefore, we abstain from a judgment on this 
question. 

If we wanted to calculate actual numerical values of the utility of gains, 
we should have to specify the numerical value of the parameter k occur- 
ring in Bernoulli’s law. If, furthermore, we wanted to calculate numerical 
values of total utilities themselves and not only of their differences, we 
should have to specify the numerical value of another parameter (the 
constant of integration appearing when (4) is integrated). However, this 
can be avoided by a device which was used by Bernoulli and Laplace: in- 
stead of characterizing a utility g (with respect to an initial fortune f.) by 
its numerical value on the psychological scale of utility, which actually has 
not been established, it is characterized by the equivalent money gain, 
which we designate by ‘*g’. This is meant as that gain in money which 
(on the basis of fe) would have the utility g. Analogously, fo + *g is 
designated by ‘*f’. If we start from a gain g and then consider its utility g, 
there is, of course, no point in using the concept and symbol just intro- 
duced, because *g is simply g. However, if the utility g has not been de- 
termined as that of a given money gain but in some other way, for in- 
stance, as an estimate, then the use of ‘*g’ will be convenient, as we shall 
see. 

The first important result is that even a game or bet which is fair, that 
is, objectively neutral for either partner, is subjectively unfavorable for 
both. Let us take our example of a bet at even odds on a throw of a sym- 
metrical coin. Let X’s initial fortune be fe and the stake w. Then the re- 
sulting fortune is either fı = fo + u or fa = fe — u. The estimate f’ of the 
resulting fortune is the arithmetic mean of the two possible and equi- 
probable results, that is, fə. Hence the estimate of the gain is g’ = o. 
However, with respect to the utility the situation is quite different. Let 
Jo, fa, and fa be the total utilities corresponding to fo, fs, and fa, respec- 
tively. Since the two results f, and f, are equiprobable, the estimate f’ of 
the resulting total utility is their arithmetic mean; hence (i) f — f’ = 
J’ — fa. According to the law of diminishing marginal utility (x), the utility 
corresponding to a change in fortune from So to fs = fo + u is less than 
that which would correspond to a change from fz = fo — u to fo, because 
fo > fa. In other words, (ii) fe — fo < fo — fa. Hence with (i): (iii) f’ < fo. 
The estimate g’ of the utility g is f’ — fo; according to (iii), this is negative. 
Therefore, accepting the bet is subjectively unfavorable, although ob- 
jectively neutral. 
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Now let us examine the same situation quantitatively, with the help 
of the corollary (3) to Bernoulli’s law. Let *f’ be the fortune correspond- 
ing to f’; that is to say, if a change in fortune from fe to *f’ were to occur, 
it would cause a change in total utility from f, to f’ and hence the addi- 
tional utility would be f’ — fə. Applying (3) to the equality of differences 
in total utility (i), we obtain an equality of ratios of the corresponding 
fortunes: (iv) fı: *f’ = *f' : fa; in other words, *f’ is the geometric 
mean between f, and f+. Since the geometric mean of two positive num- 
bers is always less than their arithmetic mean, we have: (v) *f’ < fo. 
This is a merely comparative result, essentially the same as the earlier 
result (iii). But (iv) is a quantitative result allowing numerical calcula- 
tions. In our previous example, the initial fortune was fe = 10,000, the 
stake u = 8,000; hence fs = 18,000 and f+ = 2,000. *f’ is the geometric 
mean of the latter two values, hence 6,000. Therefore the monetary 
equivalent *g’ of the estimate g’ of the utility has the negative value of 
—4,000. This means that if X accepts the bet with the stake of 8,000, 
the estimate of his resulting total utility corresponds to a fortune of only 
6,000; in other words, acceptance of the bet is equivalent in utility to 
throwing 4,000 out of the window; hence it is rather disadvantageous. 
Rule R, prescribes maximization of the estimate of the utility. Therefore 
it prohibits the acceptance of the bet, in agreement with common sense. 
Thus rule R; overcomes the first of the difficulties we found in connec- 
tion with rules R, and Rj. 

In the case just considered, the negative utility, measured by the 
equivalent monetary loss of 4,000, is enormous. This is due to the large 
stake. The accompanying table shows also the results for smaller values 


Money Equivalent of 


Stake i z the Estimate of the 
“ r Resulting Utility 
*g' 
9g 
8,000 6,000 —4,000 
1,000 9,949.88 — 50.12 
100 9,999.50 a 0.50 
To 9999-995 Si 0.005 
I 9999-99995 i 0.00005 


of the stake ų, the initial fortune being always fo = 10,000. The calcu- ` 
lation is as follows: *f’ is the geometric mean between 10,000 + w and 
10,000 — u; *g’ = *f’ — 10,000. We see from the last column of the table 
that the absolute amount of the monetary equivalent of the estimate of 
the utility decreases rapidly with decreasing stake. ° 

So far the decisions resulting from rule R; seem to be in accord with 
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common sense. But now the question arises whether this rule is not too 
rigorous by declaring all fair bets as disadvantageous even if the stake 
of X is small in relation to his fortune. We feel that a reasonable friend, 
although he would warn X against a bet of a thousand dollars, would not 
try to dissuade him from a bet of one dollar, and perhaps not even from 
one of ten dollars. The estimate of the utility is found to be equivalent 
to a loss of one two-hundredth of a cent in the first case and one-half of 
a cent in the second. One might say that these amounts, though practical- 
ly negligible, are anyway negative amounts and hence indicate that even 
bets for these moderate stakes are, strictly speaking, disadvantageous; one 
might think that if the rule prohibits these bets, it seems questionable 
whether it is in accord with sound common sense. However, the rule does 
not unconditionally prohibit these bets. It says merely that the bet is dis- 
advantageous żf the positive utility from winning and the negative utility 
from losing are all the utility factors involved. If there are other factors 
in the situation, they must be introduced into the calculation, and then 
the result may be different. It may be, for example, that X derives some 
pleasure from the excitement of the bet or from pleasing his friend who 
wishes to bet. Even if this pleasure is small, it may easily be sufficient to 
outweigh the displeasure equivalent to the loss of a fraction of one cent. 
If so, the rule leads to the decision of accepting the bet of ten dollars. If 
the stake is one hundred dollars, the estimate of the utility is equivalent to 
the loss of fifty cents. If the additional pleasures are not worth to X half 
a dollar, the rule will result in X’s decision to decline the bet. 

It is important to recognize clearly that rule R, does not tell X in any 
way how to valuate things; whether he should prefer the excitement of 
gambling or the peace of mind caused by abstaining from gambling; 
whether to help Y in his business affairs, or to defeat him by honest but 
ruthless operations, or to cheat him. The rule is not a moral rule but a 
tule of applied logic. (Therefore Laplace’s terms ‘moral fortune’, ‘moral 
gain’, and ‘moral expectation’ are somewhat misleading.) This means that 
the rule does not lay down value standards by which to judge, to approve 
or to disapprove our desires. It presupposes that X has a fixed set of in- 
terests or needs; the rule has merely the task of helping X in finding out 

f which actions are consistent with his needs and which are not. It does so, 
not a priori, but on the basis of the empirical knowledge which X has col- 
lected by his previous experiences. ’ 

Now let us examine the second difficulty which we found in connection 
with the previous rule R,. This rule prohibits taking out fire insurance if 
it is unfavorable, that is, if the premium, as is usually the case, is-higher 
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than the fair premium, i.e., the estimate of loss by fire. Common sense, on 
the other hand, advises one to insure provided the premium is not exorbi- 
tant. In our example, X has the opportunity to insure his house, valued 
at 10,000, for one year against fire at a premium of r > 10. The proba- 
bility that the house will burn down during the year is 1/1,000. The prob- 
lem is whether from the point of view of utility it is advisable for him to 
buy the insurance. The answer depends not only upon the amount r of 
the premium but also upon the present fortune fo of X. If fo is not much 
larger than 10,000, in other words, if X does not own much besides the 
house, then the destruction of the house, if not insured, would reduce X’s 
fortune to a small fraction of its present value. This reduction would cause 
a very great negative satisfaction, not only a thousand times as great as 
the expense of 10 but much more. Therefore, in this case, X would do well 
to pay the premium r, even though it is more than 10, provided it is not 
too high. 

In order to obtain quantitative values, let us assume again Bernoulli’s 
law. Suppose that X has, in addition to the house, only too in cash; hence 
fo = 10,100. If he buys the insurance, his loss is r. If he does not, there are 
two possible cases: the house may burnydown or not. In the first case the 
resulting fortune is fs = 100; in the second case, f+ = 10,100. The prob- 
ability of the first case is q: = 0.001; that of the second, q2 = 0.999. Let 
the total utilities corresponding to the fortunes fo, fz, and fa be fo, fz, and fa, 
respectively. Then the estimate f’ of the resulting total utility is, accord- 
ing to the definition of estimate (§ 41D(3)), uf: + qaf2, that is, 0.001 fit 
0.999 fa = fı + 0.999 (fa — fi). In other words, if we divide the distance 
between f, and f: on the f-scale in one thousand equal parts, f’ is the last 
dividing point preceding fa. Now, according to the corollary (3), to equal 
differences on the f-scale correspond equal ratios on the f-scale. Therefore, 
in order to find the value *f’ on the f-scale corresponding to f’, we have 
to divide the segment of the f-scale between f: and f, in one thousand 
parts, not parts of equal length but parts for which the quotient of one 
value divided by the preceding one is always the same, say, g. Thus the 
successive points on the f-scale have the values fi, fig, fia" Os es 
fq = fy. Hence g is to be calculated as the thousandth root of fa/fe. 
For f, = 10,100 and f: = 100, we find g = 1.004626. Then *f’ is found 
as the last value preceding f; ‘hence f.9°” or f./9 = 10,053-50- This 
amount is less by 46.50 than the initial fortune fẹ = 10,100. This result 
means that, if X does not insure the house, the estimate of his resulting 
total utility corresponds to a fortune of 10,053.50; hence the estimate of 
the resulting increase in utility is negative. It is measured by the corre- 
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sponding gain *g’ = *f’ — fo, which we found to be —46.50. Therefore, 
if the premium v is less than 46.50, rule R; advises X to buy the insurance, 
because in this case the estimate of the utility (corresponding to a gain 
of —r) is higher than in the case of noninsurance (where it corresponds 
to —46.50). 

In this example the premium which is subjectively neutral (46.50) is 
rather large in comparison with the premium which is fair, i.e., objectively 
neutral (10). This is due to the fact that in this example the value of the 
house constitutes a large part of the initial fortune (10,100), indeed nearly 
all of it. The accompanying table shows the results for other values of the 


Tnitial Subjectively 
Fortune felh q Neutral Premium 
te ER 
10,100 1.01 1.004 626 46.50 
15,000 1.5 I1.00I 100 16.50 
20,000 2 1.000 693 13.86 
40,000 4 1.000 288 11.52 
100,000 10 1.000 105 4 10.54 


initial fortune fo, but always for the same value of the house (% = 10,000). 
The calculation is as follows: 


oia [fo/(fo ms 10,000) |” °° ; 
"= flai =T= fo— *f = f(t — 1/9) « 


The table shows that, the higher the initial fortune fe, the lower the 
subjectively neutral premium. If f, is ten times the value of the house, the 
subjectively neutral premium is only 10.54, hence not much higher than 
the fair, that is, objectively neutral premium (10). Since the premium 
demanded by an insurance company will usually be higher than 10.54, 
this result means that, in the last case, insurance at available rates is not 
only objectively but even subjectively unfavorable. This result is in ac- 


cord with what is regarded as sound thinking in business. Even people of 
cautious character often prefer to leave a certain item, a house, a car, or 


the like, uninsured in view of prevailing insurance rates if the value of this 
item is only a small part of their whole fortune. 

Thus we see that the present rule R,, making the estimate of the utility 
rather than that of the money gain decisive in the choice of actions, over- 
comes the difficulties involved in the previous rule R,. It may be assumed 
that the present rule or a similar one using likewise inductive concepts 
like probability,, estimate, etc., would be adequate as a “guide of life,” 
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that is, as an explicatum for the vague concept of a reasonable decision as 
explicandum #f the following two requirements were fulfilled. (i) A quanti- 
tative law must be found which states either the value of the total utility 
as a function of the fortune or (like (2)) the value of the increase in utility 
as a function of the gain and the initial fortune. This law will contain cer- 
tain parameters whose values depend upon person and time. As mentioned 
earlier, it seems today plausible to assume that this law cannot have the 
simple form stated by Daniel Bernoulli but must have a more general and 
more complicated form. This is a problem to be solved by psychological 
investigation. (ii) The rule uses the concept of estimate. Therefore an 
adequate explicatum of this concept is required. If an adequate quantita- 
tive explicatum for probability, can be found, a concept of estimate can 
be defined as the probability,-mean (§ 41D(3)). An alternative procedure 
would consist in constructing an independent definition of estimate (i.e., 
one not based on probability.) or various methods of estimation for vari- 
ous magnitudes. Some contemporary statisticians investigate methods of 
estimation which are independent in this sense, because they do not be- 
lieve in the possibility of an adequate quantitative explicatum for proba- 
bility, (see § 98). In any case the development of methods of estima- 
tion, whether based on probability, or independent, is a task of quantita- 
tive inductive logic. 


§ 52. On the Arguments of Degree of Confirmation 


We may choose between two logical types for the arguments of degree of 
confirmation. In method (1) the arguments are entities expressed by sentences, 
e.g., propositions, events, or the like. In method (2) the arguments are sen- 
tences, and hence names of sentences are written as argument expressions. In 
method (1) the sentences about degree of confirmation belong to the object 
language, in (2) to the semantical metalanguage. (1) is more customary; we 
shall however use (2), because here the language can be extensional (truth- 
functional) and we do not need a modal logic as basis. At any rate, the differ- 
ence is only a technical one; all our theorems of inductive logic can easily be 
translated into form (x). Incidentally, the analogous problem for probability, 
(relative frequency) is discussed; here, the customary formulation (1) in the 
object language seems preferable. The degree of confirmation is etal ae 
only with respect to the evidence, but, like all semantical concepts, also wit! 


respect to the language system. 


In the rest of this chapter (§§ 52-54), some preliminary considerations 
will be made which will help us to find a way for solving our task, the con- 
struction of a quantitative inductive logic for our systems g. This con- 


struction will then be begun in the following chapter. Si 
We aim at finding, as a basis for inductive logic, a definition for the 
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degree of confirmation ¢ as a quantitative explicatum for probability,. It 
is essential that c is a function of two arguments, because, as explained 
earlier (§ 10A), probability, is a relative concept which is dependent not 
only upon the hypothesis in question but also upon the evidence. 

Whether propositions or sentences expressing the propositions are taken 
as arguments of the degree of confirmation is merely a technical question, 
not a question of the conception of the nature of probability,. We shall 
now discuss both alternatives; thereby the reason why we prefer the 
second will become clear. 

1. Both in the classical theory and in more recent theories of prob- 
ability, nearly all authors have taken as arguments (or, in the older period 
when the relativity of probability, was not yet clearly recognized, as the 
one argument) not sentences but something expressed or described by 
sentences, variously designated as events, possible cases, occurrences, or, 
by more modern authors, for example, Keynes and Jeffreys, as proposi- 
tions. For our present problem we need not discuss here the controversial 
question whether or not events, facts, etc., belong to the same type as 
propositions; we need not even pay attention to the particular terms used 
by the authors for the arguments. The essential point is that the authors 
of this group write sentences (or variables of the type of sentences) as ar- 
gument expressions. In words, this is done in the customary formulation: 


(a) ‘the probability that . . . (on the evidence that - - -) is 1/6’. 


Using a language of symbolic logic, Keynes writes, for example, 
‘P(a/h) = 1/6 or briefly ‘a/h = 1/6’, and similarly Jeffreys ‘P(q|p) = 
1/6’; ‘a’, ‘H’, ‘p’, ‘q’ are variables for which sentences may be substituted. 
This way of formulation has an important consequence; probability, be- 
comes here a function of the kind known as intensional (non-extensional, 
non-truth-functional). Therefore the theory of probability, in this form 
must be based not on the form of logical system ordinarily used in sym- 
bolic logic but on an intensional, modal system. Not even the authors 
mentioned above, although they use symbolic logic, seem to be aware of 
this fact. That probability, is here not a truth-function is obvious. [For 
instance, from the schema (a) we can first form a true sentence (a;) by 
writing a suitable sentence in the place of the three dashes and another 
sentence which happens to be true in the place of the three dots; and then 
we can form another sentence (a,) which is false by putting for the dashes 
the same as before but for the dots another true sentence.] How modal 
logic comes in may be seen from the following example. In our theory, 
which belongs to the second kind, there is the following theorem (T59-1b): 
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(b) ‘For any sentences e and h, if e is not L-false and the sentence 
e D his L-true (in other words, e L-implies 4), then c(h,e) = 1.’ 


In order to transfer this theorem to the present form of theory, we need 
first, instead of ‘c’, a corresponding symbol, say, ‘c’ (or Keynes’s and 
Jeffreys’ ‘P’), which takes sentences as argument expressions, and further 
a modal sign, say, ‘N’ for logical necessity, which corresponds to ‘L-true’ 
(or ‘}’) but takes a sentence as argument expression. Then the theorem 
corresponding to (b) can be formulated in the following way either in 
words (c) or in symbols (c’): 

(c) ‘For any p and q, if p is not impossible and if it is necessary that 

p D q (in other words, if p strictly implies g), then c(q,p) = 1. 

(c) (0) (QI~N ~ p«N(b D g) D clgp) = 1}. 

If one wants to be more explicit, he may write in (c) after ‘any’ a suitable 
noun, say, ‘propositions’ or ‘cases’ or ‘events’ or whichever term of this 
kind he prefers; this is, however, not necessary; it is sufficient to stipulate 
the type of the variables ‘p’ and ‘g’, for instance, by the customary rule 
that sentences may be substituted for them. In contradistinction to (b), 
which belongs to the metalanguage, (c) and (c’) belong to the object lan- 
guage. This becomes still clearer when we take a substitution instance of 
(c’), e.g., with ‘Pa’ for ‘p’ and ‘Pa V Pb’ for ‘g’: 

(d) ‘~N ~ (Pa) »N(Pa D PaV Pb) D c(PaV Pb, Pa) = v. 

Ifa suitable system of modal logic is used, then the two conjunctive com- 
ponents of the antecedent can be proved, and hence we obtain: 

(e) ‘c(Pa V Pb, Pa) = r. 

2. The second method of formulation takes sentences as arguments. 
Therefore here, not sentences, but names of sentences (or, in general theo- 
rems, variables for names of sentences, like our ‘K’, ‘i’, etc.), are written 
as argument expressions. Thus here, the sentences about probability, or 
degree of confirmation belong to the metalanguage and, in particular, to 
its semantical part, not to the same language as the sentences to which 
they refer. This method has only recently been used by a few authors, for 
instance, Mazurkiewicz and Hosiasson; and we shall use it too. Instead 
of (c) or (c’), we have here the form (b). Here, again, we may form a sub- 
stitution instance corresponding to (d). Taking ‘© as a name (not an 
abbreviation!) for ‘Pa’ and ‘G,’ for ‘Pb’, we have: 


() ‘If ©, is not L-false and ©; D S; V ©; is L-true, then (S.VS., 
6) =r. 
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Since the two conditions are fulfilled, we may omit the conditional clause; 
thus we obtain, corresponding to (e): 


(E) (S: V S, S) = r. 


This second method has the advantage that here the language to which 
the sentences about probability, or c belong may be extensional. Thus 
here a simpler structure can be taken for the underlying deductive logic 
than in the first method. The fact that here we cannot stay within the 
object language but have to use a metalanguage does not seem too high 
a price for the advantage mentioned. 

The choice between the two methods (1) and (2) here discussed for the 
formulation of inductive logic is analogous to the much-discussed choice 
between two well-known methods for the formulation of deductive logic. 
The latter can take the form either (1) of a modal logic in an intensional 
object language, like Lewis’ system of strict implication, or (2) of a theory 
of L-concepts within semantics, as here in the earlier chapter on deduc- 
tive logic (§§ 20-24). (Concerning modal logic and its relation to seman- 
tics cf. [Modalities] § 1 and [Meaning] § 39.) The concepts of L-truth, 
L-falsity, and L-implication in method (2) correspond to the modal con- 
cepts of necessity, impossibility, and strict implication, respectively, in 
(z). There is so far no agreement as to which of these two methods for de- 
ductive logic is preferable. Those who prefer here the semantical method 
(2) will presumably also prefer our semantical method (2) for inductive 
logic. However, as said before, the difference is only a technical one; all 
theorems of our inductive logic stated in later chapters can easily be 
translated into formulations according to method (1), as shown by the 
examples (b) and (c) (or (c’)) above. 

Some remarks may be made incidentally on the analogous problem for 
probability., that is, relative frequency. As explained earlier (§ 10B), we 
have here likewise two arguments. Method (1) takes properties as argu- 
ments; method (2) predicates designating those properties. In method (1), 
a probability, sentence is formulated in the object language; but here, in 
distinction to probability;, the language need not be intensional. Since 
coextensive properties have obviously the same cardinal number, and 
probability, is definable as a function of cardinal numbers (in the simplest 
form, as a relative frequency, hence a quotient of two cardinal numbers), 
the value of probability, does not change if one argument property is re- 
placed by another one of the same extension. Therefore here, in distinc- 
tion to probability, there is no advantage in taking method (2) and there- 
by going to the metalanguage. The formulation of probability, sentences 
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in the object language has the great advantage that these sentences, which 
are not, like the probability, sentences, purely logical but have a factual 
content, can be dealt with on a par with other factual sentences of science. 
Thus we can, for example, in physics, combine laws of a deterministic 
form with probability, laws without any difficulties. Therefore it is un- 
derstandable that, as far as I am aware, all authors on probability, have 
‘used method (1). 

In most applications of probability, the first argument, which we call 
the hypothesis, is an assumption about facts not known or insufficiently 
known; it may be a prediction of a single event or a physical law or an 
existential assumption or a complex theory consisting of general and non- 
general sentences. In order not to restrict the applicability unduly, we 
shall admit as first argument for ¢ any sentence of our systems l of what- 
ever form; a theory consisting of various statements will’then be formu- 
lated as a conjunction. The second argument of probability:, which we call 
the evidence, is often a report on observations, which may be formulated 
as a conjunction of basic sentences (D16-6b). However, we shall not re- 
strict the second argument of ¢ to this form but again admit sentences of 
all forms, including general ones, with the sole exception of L-false sen- 
tences (the term ‘L-false’ is explained and defined in § 20). To make this 
exception is customary and convenient. [We shall see later that c(4,e) may 
be represented as a quotient of two numbers (§ 54B). If e is L-false, both 
these numbers are o. In arithmetic the function of quotient is not defined 
for the case that both arguments are zero; this restriction is found con- 
venient because otherwise (that is, if we were to make an ad hoc stipula- 
tion for the value of ‘o/o’) some general arithmetical theorems would not 
hold in their customary simple form: For the same reasons it is convenient 
to exclude the case of an L-false second argument (evidence) from the 
domain of definition of the function c.] 

We have earlier (§ roA) emphasized the relativity of probability, or 
degree of confirmation ¢ with respect to the evidence. But it is relative in 
still another way, viz., with respect to the language system. c is, as we 
shall see, closely related to the semantical L-concepts and may even be 
regarded as a quantitative semantical L-concept. Therefore, like all se- 
mantical concepts, c is dependent upon a language system, any c-sentence 
must, in complete formulation, contain a reference to a language system, 
for instance: ‘c(#,e) = r in the system ¥. For ‘c in the system Ly’ we shall 
sometimes write ‘ye’ and for ‘c in the system lo’ ‘oc’. More frequently, we 
shall omit for the sake of brevity the reference to the language system if 
either it is not essential for the point under discussion or the context 
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makes sufficiently clear which system or systems are meant; this formula- 
tion is in line with the customary elliptic formulations of other semantical 
concepts (e.g., ‘true’ instead of ‘true in &’). The relativity with respect to 
language has generally been overlooked, even by those modern authors 
who use symbolic logic. Leaving aside language systems of a more com- 
plex structure and with other kinds of variables and considering only 
our language systems £, there are especially two features of a system which 
may influence c. (i) The number of in in the language system, and hence 
the number of individuals in the domain of individuals of the system, is 
obviously of influence upon c if general sentences are involved; for, as we 
have seen earlier (§ 15A), the meaning and hence the L-semantical prop- 
erties of a general sentence change with the number of in. (ii) The influence 
of the pr in upon c in £ is not so obvious; we shall see later that in certain 
cases the value of c(%,e) in & depends upon what other pr besides those 
occurring in % and e belong to &. [One reason for this dependence is the 
fact that sometimes c(h,e) in &” (§ 31) is influenced by the logical width 
(D32-r1a) of a molecular predicate expression occurring in / or e and that 
this width is dependent upon the total number z of pr including those not 
occurring in % or e.] 


§ 53. Some Conventions on Degree of Confirmation 


Some conventions concerning ¢ are laid down. They are not part of our sys- 
tem of inductive logic but serve merely for heuristic purposes; they will be used 
only in the preliminary considerations in the next section. The conventions, 
among them the customary principles of multiplication and addition, are 
plausible and fulfilled by all adequate quantitative explicata of probability,. 


Since the task of defining a quantitative concept of confirmation seems 
rather complicated in view of the difficulties some of which have been 
discussed in this chapter, we shall make an attempt in the next section 
of reducing this task to simpler tasks. In order to do so, we shall have to 
make use of some simple, fundamental properties of degree of confirma- 
tion. In this section we shall lay down some of these properties by way of 
conventions. The first convention Cr states only properties which it 
Seems clear any adequate quantitative explicatum must have. And it 
seems that indeed practically all authors who use probability, as a quanti- 
tative concept, even if only within a restricted field, have accepted these 
properties. Some authors have laid down these conditions, or similar ones 
from which these follow, as axioms, We shall not do so; our system of 
inductive logic will not be based on axioms but only on explicit definitions. 
The conventions here made will be used only for heuristic purposes, for 
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the preliminary considerations in this and the next section; they will not 
be used later on. Cr is not a definition of adequacy; the conditions (a) to 
(d) are necessary, but they together do not form a sufficient condition 
for adequacy. That is to say, a concept fulfilling these conditions may still 
be inadequate; but no concept which violates one of these conditions can 
be regarded as adequate. À 


C53-1. Convention on adequacy. A quantitative function ¢ defined for 
some pairs of sentences of a system £ is not adequate as a quantitative ex- 
plicatum for probability, unless it fulfils the following conditions (a) to 
(d) with respect to any sentences in £ for which it is defined. 


a. L-equivalent evidences. If e and e’ are L-equivalent, then c(%,e) = 
elhe’). 
b. L-equivalent hypotheses. If h and k' are L-equivalent, then c(h,e) = 
c(h’,e). £ A 
c. General Multiplication Principle. c(h «j,e) = c(h,e) X c(,e « h). 
d. Special Addition Principle. If e.h«j is L-false, then c(h V j,e) = 
c(h,e) + c(j,e). f 
It seems plausible to require the four conditions Cra to d for any 
explicatum ¢ for probability;. (a) says that the value of ¢ does not change 
if one evidence is replaced by another L-equivalent one; (b) says the same 
for hypotheses. These requirements seem natural, since L-equivalent 
sentences have the same content, give the same information about the 
facts, and differ at most in their formulations. It seems obvious that the 
value of probability, depends not upon the formulation but merely upon 
the content. Those theories of probability, which take as arguments, not 
sentences as our theory does, but propositions (§ 52, method (r)), do not 
need principles corresponding to (a) and (b), since L-equivalent sentences 
express the same proposition. (c) and (d) are generally accepted in prac- 
tically all modern theories of probability, (and, incidentally, their ana- 
logues occur in all theories of probability.). Earlier authors have usually 
given simpler forms instead of (c) and (d), but later it was recognized 
that those simpler forms hold only under certain restricting conditions. 
(c) and (d) are in accordance with what reasonable people think in terms 
of probability, and, in particular, what they are willing to bet on certain 
assumptions. 
Examples. (1) for (c). Suppose X has the knowledge e concerning the pres- 
ent political situation in the United States and is willing to bet with the betting 


quotient 7; (§ 41B) on the hypothesis 4 that a certain candidate Y will be 
nominated by one of the parties as presidential candidate for the next election; 
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suppose further that he makes up his mind that, if he knew =% in addition to e, 
then he would be willing to bet with the quotient r+ on the second hypothesis j, 
that Y will be elected president; then X will be willing, on the basis of his actual 
present knowledge e, to bet with the quotient 7,7, on the combined hypothesis 
h «j that Y will be first nominated and then elected. (2) for (d). Suppose again 
that X, on the basis of his knowledge e, is willing to bet with the quotient r, on 
the hypothesis 4 of Y’s nomination as candidate of one of the parties, and 
further with the quotient r, on the hypothesis h’ of the nomination of another 
candidate Y’ by the same party; suppose, moreover, that it follows from e that 
only one person can be nominated by the same party, so that +. h’ is incom- 
patible with e and hence e « h « h’ is L-false; then X will be willing, on the basis 
of e, to bet with the quotient r: + r, on the disjunction 4 V h’, that is, the hy- 
pothesis that either Y or Y’ will be nominated. 


The convention Cx does not imply that there actually is an adequate 
quantitative explicatum but merely that, if there is any such explicatum, 
then it will fulfil the four conditions (a) to (d). 

Which interval of numbers is chosen for the values of ¢ is a matter of 
an inessential convention as long as ¢ is interpreted only as evidential 
support. But in view of the fact that we interpret ¢ more specifically as 
value of a fair betting quotient (§ 41B), and, in certain cases, as an esti- 
mate of relative frequency (§ 41D), we take the real numbers of the inter- 
val o to 1, both end points included. This is in accordance with the cus- 
tomary use since the beginning of the classical theory. Since the tautologi- 
cal sentence ‘t’ (§ 15A) is necessarily true in all possible cases, its prob- 
ability, has the maximum value on any (not L-false) evidence, hence the 
value r. Therefore we lay down the following convention C2. 


C53-2. Convention concerning the maximum value. For any not L-false 
e, c(t,e) = 1. 

The following theorem states some simple consequences of the con- 
ventions Cz and C2. They are not part of our system of inductive logic 
but will be used only in the preliminary considerations in the next section. 


T53-1. Let c be a quantitative function which (i) fulfils the conditions 
Cra to d, (ii) is defined for every pair of sentences in & the second of which 


is not L-false, and (iii) fulfils C2. Then the following holds, provided e is 
not L-false. : 


a. c(e«h,t) = c(e,t) X c(h,e). 


Proof. From Cre by substitutions, In the last factor, we replace the evi- 
dence ¢ «e by e (Cra), which is L-equivalent (T21-5s(1)). 


b. If c(¢,t) = o, c(h,e) = “S42, (From (a).) 
c. Addition principle for multiple disjunction. Let e.h . h’ be L-false 
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for any pair h,h’ of different sentences taken from the sentences 


ha, ha, e+ ha. Then c(hV hV... V hme) = >> clm e). (From 


Crd, by mathematical induction.) 
d. c(h,e) + c(~h,e) = 1. 
Proof. c(h,e) + c(~h,e) = c(h V ~he) (Crd), = e(t,e) (C1b), = x (C2). 


e. c(~t,e) = o. 
Proof. c(~t,e) = 1 — e(t,e) (d), = 1 — 1 (C2), = o. 


f. If h is L-false, c(k,e) = o. (From (e), Crb.) : 


Let 3; be an arbitrary 3 (state-description, D18-1a) in a finite system 
£y. 3; is a certain conjunction of basic sentences which is factual (T20-sb), 
not L-false; it represents one of a finite number of possible cases. Before 
an observer makes any observations concerning the individuals of tw, he 
cannot know whether the possible case represented by 8; is the actual 
one or not. To attribute to 3; the probability, o on the basis of the tauto- 
logical evidence #, that is, without any knowledge of facts, would be an 
a priori decision not to reckon with the occurrence of this possible case. 
This seems entirely unjustified. Therefore we lay down the convention C3 
to the effect that in this case c should be positive. This convention applies 
only to finite systems fy; the situation in Qo is different because the num- 
ber of 3 is infinite. 


C53-3. For any 3; in fy, ¢(8:,t) > o. 


This convention C3, in distinction to the earlier ones, does not seem to be 
generally recognized. We shall later (in Vol. II) examine a number of induc- 
tive methods which have been proposed in the form of a theory either of prob- 
ability, or of estimation. We shall find that some of these methods are in conflict 
with C3, but only implicitly, in the following sense. They do not assign any 
value to probability; on a tautological evidence, nor do they speak of state- 
descriptions. However, the values which they assign to probability, or to esti- 
mates with respect to a factual evidence correspond, in a certain sense, to the 
assignment of the probability, o on the tautological evidence to some state- 
descriptions. It will be shown that some of the values which these methods 
actually yield with respect to factual evidence are not adequate. It seems that 
all adequate explicata of probability: are in accord with the convention C3. 


§ 54. Reduction of the Problem of Degree of Confirmation 


Some considerations are made which will help us in finding a way toward our 
aim, a definition of c as an explicatum of probability;. (A) to (C) show how this 
problem can be reduced to simpler problems; (D) and (E) concern further re- 
quirements for an adequate c. A. c(h,e) in the infinite system may be defined 
as limit of the sequence of values c(4,e) in 8m with increasing N, provided this 
sequence is convergent (1). B. co(7) is defined as the confirmation on the tauto- 
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logical evidence, c(j,t), called the null confirmation (2). Then it follows from 
the conventions in § 53 that in fy c(h,e) = co(e » k)/co(e) (3). C. co(7) in Ly is 
the sum of the co-values for those stafe-descriptions (3) in Qy in which j holds 
(4). Thus our problem is reduced to the problem of finding, for every system 
Qy, a suitable function co for the 3 in fy. D. A function co for the 3 in Qy must 
be such that (a) it is positive for every 3, and (b) the sum of the ¢y-values for 
all 3 in Qy is 1. E. A further requirement (6) is laid down in order to make sure 
that co-functions chosen for different systems Qy fit together. The results here 
found will guide us in the construction of the system of inductive logic in the 
following chapters. 


In this section we shall carry out some preliminary considerations; in 
particular, we shall try to reduce the problem of finding a definition for de- 
gree of confirmation for the language systems step by step to simpler 
problems without, however, restricting the general scope of our task. 
These considerations are informal, without any claim to exactness; they 
are merely intended to find a way, or a first part of a way, which may lead 
to our aim. In the next chapter we shall begin the systematic construction 
of inductive logic, guided by what we find here. 


A. Degree of Confirmation in the Infinite System 


Our aim is to find a function c for the system l» and for every one of 
the systems @y such that (i) chas a numerical value in any of the systems 
&y for every pair h,e of sentences where e is not L-false and in le for as 
many of these pairs as seems feasible; (ii) c is adequate as a quantitative 
explicatum for probability,; (iii) hence c fulfils the conditions of adequacy 
C53-1a to d; (iv) c fulfils furthermore the conventions C53-2 and 3. For 
the considerations in this section we shall not consider the requirement (ii) 
but only the weaker requirement (iii), which follows from the former. In 
later chapters, when we go beyond the partial solution to be here dis- 
cussed, we shall of course have to take into consideration the stronger 
requirement (ii). From (iii) and (iv) it follows that ¢ must have the 
properties stated in T53-r. 

If tentative steps are made toward a solution of the problem indicated, 
it soon becomes clear that one of the chief difficulties involved consists in 
the infinity of 22. The task seems less difficult for the finite systems fy. 
Now the latter systems become, with growing N, practically more and 
more similar to le so that, for instance, a system with a billion billion in- 
dividuals is practically not much different from £o , although theoretically 
there is, of course, always a fundamental difference between the infinite 
system and any finite system however large. For any pair of sentences 
he in Qo there is an n (viz., the largest subscript of an in occurring in 
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h or e, or 1 if no in occurs) such that # and e occur likewise in every Ly 
with V = n. If now we had a definition of c for all systems fy and it did 
turn out that the sequence of the values we(#,e) converges with increas- 
ing N toward a limit z (D4o-6), then it would not appear unplausible to 
take r as the value of c(h,e). The condition of convergence will later be 
examined for the function c* which we shall define (in Vol. II); it will be 
seen that it is fulfilled for nongeneral sentences throughout and for gen-. 
eral sentences at least in a comprehensive class of cases. Thus the follow- 
ing definition is suggested: 


(1) oc(he) =p; lim ye(h,e) . 
N>0 


Hereby our problem is reduced to the problem of a definition of ¢ for all 
systems Ly. 


B. The Null Confirmation 


For the next step in the reduction we make use of T53-1b. This theorem 
shows that c(4,e) is the quotient of the c-values of certain sentences with 
respect to the tautological evidence ‘’. Since ‘# does not give any factual 
information—we say sometimes that it has the null content (T73-1b)— 
c(j,t) is the extreme case of ¢ before any factual knowledge is available; 
we call it the null confirmation of j. Since this concept is often used in in- 
ductive logic, we introduce a simple notation for it: 

(2) cj) =o Gd). 

Now we use T53-1b, but restricted to the systems £y. There is first the 
condition that ¢,(e) = o. We shall see later that for all c-functions which 
come into consideration as explicata for probability,, wco(e) = o only if 
e is L-false. (This holds only for £w, not for le; that is the reason why we 
make the present step of reduction after the first.) Thus wewbtain from 
T53-1b: 


G) In any system &y, if e is not L-false, 
Colek) 
c(h,e) ae: 


Hereby, the task of finding a suitable function ¢ for the pairs of sentences 
in Qy is reduced to the task of finding a suitable function ce for the sen- 
tences in fy. 


C. Reduction to State-Descriptions 


For the next step, we make use of the theorem (T21-8c) that any 
non-L-false sentence j in Qy is L-equivalent to a disjunction j’ whose # 
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components ( = 1) are those n 3 (state-descriptions, D18-1a) in which 
j holds, in other words, the 3 belonging to 9; (the range of j, D18-6a). 
Therefore, co(j) = co(j’) (C53-1b, with the evidence ‘t’). If k and h’ are 
any two different 3 in fy, then #.h' is L-false (T21-8a) and hence also 
t. h.k’. Therefore, according to the addition principle (T53-1c) the fol- 
lowing (4a) holds for c.(7’), and hence likewise for ¢.(j). 

(4) a. If7is not L-false, co(7) is the sum of the ¢,-values for all 3 in R;. 

b. If j is L-false, co(7) = o (T'53-1f). 

Thus the task of finding a suitable function ce for all sentences in £y is 
reduced to the task of finding a suitable function ¢, for all 3 in Qy, hence 
for a very restricted, special kind of sentences in fy. This ends the reduc- 
tion of our problem. 


D. Null Confirmation for the State-Descriptions 


We shall now consider requirements which a function ¢, for the 3 in fy 
should fulfil. If a function ce for the 8 is chosen, then it determines unique- 
ly a function c, for the sentences according to (4), and a function c for the 
pairs of sentences according to (3). 

First let us see which restricting conditions are imposed upon the 
choice of a function c, for the 3 in Qy by the requirement that c fulfil also 
the conventions C53-2 and 3. 

Since ‘? holds in every 3 (D18-4c), c(t) is the sum of the co-values for 
all 3 in Qy. On the other hand, ¢.(#) = 1 (C53-2). This leads to the second 
one of the following requirements; the first is given by C53-3. 


(5) Requirements for co-functions for the 3 in Qy. 
a. For any 3; in fy, o(3;) > o. 
b. The sum of the c.-values for all 3 in Qy is 1. 


Suppose we start with some ¢,-function for the 3 which fulfils these two 
requirements; it determines a c,-function for the sentences according to 
(4); and this, in turn, determines a c-function according to (3). Then the 
latter will be called a regular c-function. 


E. The Requirement of Fitting Together 


There is still another point to be considered in choosing a suitable co- 
function for the finite language systems. For a given system fy, there are 
many c-functions adequate as explicata for probability,, and indeed an 
infinite number of them. This is analogous to the situation with the tem- 
perature concept (§ 5); there is an infinite number of quantitative ex- 
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plicata for the comparative explicandum Warmer, that is to say, an in- 
finite number of possible scale forms for temperature, several of which 
have actually been used in physics (among them several scales based on 
various thermometrical substances, as mercury, alcohol, hydrogen, etc., 
and the thermodynamic scale). This does not mean that all these con- 
cepts, either in the case of temperature or in the case of c, are equally good 
as explicata; one concept may have certain advantages and another con- 
cept may have advantages in other respects, and some concepts may be 
clearly inferior to certain others. It means merely that every one of these 
concepts is not entirely inadequate as explicatum, although there is, of 
course, no sharp boundary line. Now suppose that we choose a sequence 
of co-functions, one for each system Ìy, say, ĉo for Ra, 2Co for Q, etc. Then 
it may very well happen that each one of these c.-functions is adequate for 
its system, that is to say, it leads to a c-function adequate as an explicatum 
for probability, and that nevertheless the different c.-functions of the 
sequence do not fit together. In other words, the choices of one co-func- 
tion for each of the systems Qy should not be made independently of one 
another. As we have seen earlier (§ 15A), any nongeneral sentence has 
the same meaning in all systems €y in which it occurs. Therefore it is to 
be required that, for any nongeneral sentences / and e, c(%,e) have the 
same value in all systems fw in which both sentences occur; and hence 
it is to be required that, for any nongeneral sentence J, Co(j) have the same 
value in all systems £y in which j occurs. However, we choose a function 
co for the 3, not for the sentences in general; the corresponding function 
co for the sentences is determined by the function co for the 3. Therefore 
we have now to examine what the requirement just stated means for the 
latter function. 

Let i be any 3 in ty. Then ŝ is a nongeneral sentence in Qy and hence 
also in Qy41; but it is not a 3 in lw+r N: in Ly is {i} (Tr9-6). N; in Wyss is 
the class of those 3 in w+ of which ż¿ is a subconjunction (T19-s5b). There- 
fore (see (4a) above), w4:¢o(é) is the sum of the ¢o-values for all those 3 in 
Zw: of which ż is a subconjunction. This suggests the following require- 
ment. 

(6) Requirement of fitting together: for every N and for every 3; in lv, 

nCo(3s) must be equal to the sum of the y:¢o-values for all those 
3 in Lys: of which 3; is a subconjunction. 
If we fulfil this requirement in our choices of co-functions, one for each 


* system Qy, then these choices are dependent upon one another in the fol- 


lowing way. Suppose we have chosen a co-function for w+; then a certain 
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co-function for ly is thereby uniquely determined, and the choice of a 
Co-function for w+: is, although not determined uniquely, restricted 
within rather narrow limits. In the next chapter we shall deal with those 
sequences of functions ,¢o, 2Co; - - - , Wo, . - - for the systems 2%, 2.,..., 
Qy, . .. , which fulfil the requirement (6), and with the sequences of func- 
tions ,c, 2¢,..., NC, . . . based upon the ¢,-functions in the manner 
earlier explained; the latter sequences will be called fitting sequences of 
c-functions. When we shall speak in inductive logic about c with re- 
spect to different systems fy, we shall usually presuppose that the c-func- 
tions for the different systems fy fit together in the sense of forming a 
sequence of the kind just described. 


In the next chapter we shall not yet choose a particular function c. 
Instead we shall study the common properties of all regular c-functions. 
This will be a general inductive logic. Only much later (in Vol. IT) shall 
we choose.a particular one among the regular c-functions, designated by 
‘c (see Appendix, § 110). Then we shall base our special theory of induc- 
tive logic on this function c*. 


= 
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§ 55. 


CHAPTER V 


THE FOUNDATION OF QUANTITATIVE IN- 
DUCTIVE LOGIC: THE REGULAR 
c FUNCTIONS 


According to the plan previously outlined (in § 54), we lay down the follow- 
ing definitions. We start with a measure function for the 3 (state-descriptions) 
in a finite system Qw; this is any distribution of positive real numbers, whose 
sum is 1, among the 3 (Ds5-1). Then we define m(j) for a sentence j as the sum 
of the m(3) for those 3 in which j holds (D55-2), and c(h,e) as m(e « h)/m(e) 
(Ds5-3 and 4). All m-functions and c-functions constructed in this way are 
called regular. Thus ¢ (i.e., any regular c-function), which is the fundamental 
concept of inductive logic, measures the extent to which one range is partially 
included in another; on the other hand, L-implication, which is the funda- 
mental concept of deductive logic, corresponds to total inclusion of one range 
in another (see diagrams in § 55B). The values of m- and of ¢-functions for 
the infinite system læ are defined as limits of the values for finite systems 
(§ 56). The task of this chapter is to construct, on the basis of the definitions 
mentioned, the theory of the regular c-functions as the fundamental part of 
quantitative inductive logic. The null confirmation co(%4) is defined as the c of h 
on the tautological evidence ‘f’, hence as the confirmation of k before any 
factual knowledge is available (D57-1). It turns out that co coincides with m 
(T57-3). For any L-true sentence j, m(/) (and hence ¢(j)) = 1; but it hap- 
pens sometimes (however, only among general sentences in lœ) that also a 
factual sentence i has the m-value 1; in this case, ¢ is said to be almost L-true 
(Ds8-1a). Among the theorems concerning regular c-functions (§§ 59, 60, 61), 
we find the fundamental theorems of the classical theory, e.g., the general and 
the special addition theorem (Ts9-1k and 1), and the general multiplication 
theorem (Tso-1n); furthermore, among the theorems dealing with the con- 
firmation of a hypothesis on the basis of relevant observations (§§ 60 and 6r), 
we find the general division theorem (T6o-1c) and the much-debated Bayes’s 
theorem (T6o-6). 

An examination of some modern axiom systems for probability: shows that 
they are all rather weak (§ 62); they are contained in what we call the theory 
of regular c-functions. In our view, this theory is only a small part of inductive 
logic. We shall construct the remaining parts in later chapters. 


Regular m- and c-Functions for Finite Systems 


A. Following the plan outlined in § 54, we lay down the following definitions. 
A function m (corresponding to ¢o in § 54B) is called a regular m-function under 
the following conditions; first, m is applied to the 3 in &y and assigns to them 
positive real numbers whose sum is 1 (Dx); then m is extended to all sentences 
in &y by taking as m(j) the sum of the m-values for all 3 in R; (D2). If mis 
regular and c(k,e) is defined as m(e . k)/m(e), ¢ is called a regular ¢-function 
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(D3, D4). The regular c-functions are just those which fulfil the conventions 
in § 53; they seem to include all functions which might be regarded as adequate 
explicata for probability. 

B. By these definitions, the difference between deductive and inductive 
logic becomes clear. Deductive logic deals with L-implication, hence with the 
case of total inclusion of one range in another. Inductive logic deals with c, 
which is the ratio of partial inclusion of one range in another, measured by m; 
hence, ¢ is, so to speak, partial L-iniplication (see the diagrams). 

C. Incidentally, probability, i.e., relative frequency, is likewise the ratio of 
partial inclusion of one class in another. This explains the analogy between 
the theories of probability, and probability, However, there remains this fun- 
damental difference: for probability, the partial inclusion is a factual matter, 
and hence the value of probability, is established empirically; on the other 
hand, probability, concerns partial inclusion of ranges, which is of a purely 
logical nature. 


A. Regular m- and ¢-Functions 


In this chapter we begin the construction of quantitative inductive 
logic. However, here we shall not yet select one function ¢ but rather deal 
with a very comprehensive class of functions, which we call the regular 
c-functions. All functions which I would regard as adequate quantitative 
explicata for probability, belong to this class. 

The fundamental conception which leads us to the definition of the 
regular c-functions is very simple. Briefly speaking, they are those func- 
tions which fulfill the conventions laid down in § 53. Therefore, our con- 
struction here will follow the plan outlined in § 54. 

We call a numerical function m for the 3 in Qy a regular measure 
function or regular m-function (Dr) if it fulfils the two requirements for 
č stated in § 54D: the values are positive, and their sum is 1. [At this 
step, we do not use the symbol ‘c,’, but a neutral symbol ‘m’. Later, ‘co 
will be defined as confirmation on the basis ‘’ (D57-1); and then it can 
easily be shown that c and m coincide (T57-3).] Then, according to 
§ 54C, we extend a function m for the 3 in &y so as to apply to all sen- 
tences in £y, by defining m(j) as the sum of the m-values for all 3 in 9; 
(Dz), [If mis a function for the 3, then, strictly speaking, we should have 
to use another symbol, say, ‘m”, for the corresponding function for the 
sentences. However, the definition would easily show (with T19-6) that, 
for any sentence j which is a 3, m’(j) = m(j); in other words, m’ is merely 
an extension of the function m. Therefore it is convenient to use the same 
symbol for both functions.] Then, in accordance with § 54B, on any func- 
tion m we may base a function ¢ by defining c(h,e) as m(e« h)/m/(e) (D3). 
If c is based in this way on a regular m-function, we call it a regular c- 
function (D4). 
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+D55-1. m is a regular measure function (or, briefly, a regular m-func- 
tion, or a regular m) for the 3 in ly =p; m fulfils the following two 
conditions. 

a. For every 3; in &y, m(3;) is a positive real number. 

b. The sum of the values of m for all 3 in fy is 1. 


T55-1. Let m be a regular m-function for the 3 in Qy. Then, for every 3; 
in ly, o < m(8,) < 1. (From Dr.) z 
+D655-2. Let m be a regular m-function for the 3 in fy. We extend m 
to a regular m-function for the sentences in Ìy in the following way. 
a. For any L-false sentence j in Qy, m(j) = ps o. 
b. For any non-L-false sentence j in fy, m(j) =pr the sum of the 
values of m for the 3 in Rj. 


D3 introduces merely an auxiliary term for D4. 


D655-3. Let m be a numerical function for the sentences in ly, and ¢ be 
a numerical function for pairs of sentences in fy. ¢ is based upon m = ps 
for any sentences e and h in Qy, where m(e) # o, c(h,e) = meh. for 
any e, where m(e) = o, c(h,e) has no value. 

+D55-4. c is a regular confirmation function (or, briefly, a regular 
c-function, or a regular ¢) for y = ns ¢ is based (D3) upon a regular m- 
function for the sentences in &y. 

Instead of ‘m for Qy and ‘c for Qy’, we shall sometimes write ‘ym’ and 
‘yc’, respectively. í 

T2 is an immediate consequence of the given definitions; it serves as 
lemma for later theorems. 

T55-2. Let c be a regular c-function for fy. Then there is a regular m for 
the sentences of £y (namely, that upon which ¢ is based) such that the 
following holds. 

a. For any pair of sentences h,e in fw, where m(e) ¥ o, c(h,e)= 

aa (From D4, D3.) 

b. chas a value for a pair of sentences h,e in w if and only if m(e) # o, 

hence if and only if e is not L-false in fy. (From D4, D3, D2.) 


Remarks on the exclusion of an L-false evidence. Let c be regular and based 
upon m. According to our definitions, if e is L-false and hence m(e) = 0, (hye) 
has no value (Tzb). This is not the only possible procedure. As an alternative, 
let us consider a definition D3’ which is like D3 except for stating that, if m(e) = 
o, c(h,e) = 1; D4 remains unchanged. Here, ¢ has a value for every pair of sen- 
tences in £y. If we want at all to assign a value to ¢ in the case mentioned, the 
value 1 seems the most natural. For our original definitions have the effect that, 
except for L-false e, if e D & (i.e., e L-implies h), c(hye) = 1 (see below, T59-1b); 
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now, if e is L-false, then, for every k, | e D k (T20-2h); hence D3’ yields the 
result that, without exception, if |e D h, c(k,e) = 1. In a similar way, most of 
the theorems to be stated later are valid on the basis of the alternative defini- 
tion D3’ in a more general way, with omission of the restricting condition ‘if the 
evidence is not L-false’. On the other hand, there are some theorems which must 
retain this restricting condition even on the basis of D3’. For instance, this holds 
obviously for the theorem (T59-1f) that, if 4 is L-false, c(h,e) = o; for, if e is 
also L-false, D3’ would yield here c = 1. The same holds for the special addi- 
tion theorem (T59-11); for, if e is L-false, on the basis of Dg’ c(h,e) = 1 and 
likewise e(~ h,c) = 1; however, c(t V ~h,e) cannot be the sum of these two 
values, i.e., 2, because no c-value exceeds r. These and similar consequences 
of D3’ cannot be regarded as disadvantages in comparison with D3, because 
they mean merely that on the basis of D3’ some theorems must contain a 
certain restricting condition, which on the basis of D3 many more theorems 
must contain. 

The disadvantages of D3’ appear when we come to lœ. Here, m and c will 
be defined (in § 56) as limits of their values for the finite systems Qy, in ac- 
cordance with § 54A. In £y the chief difference between D3 and D3’ consists 
in the fact that in certain cases D3’ gives a value to c where D3 does not. In $o 
we find the opposite; in some special cases, D3’ gives no value where D3 does. 
This happens when the additional values of ¢ in the finite systems on the basis 
of D3’ destroy the existence of a limit. [For exemplification, let us refer to 
those functions m and ¢ which we shall later select as basis for inductive logic 
(they will then be designated by ‘m*’ and ‘c*’, § 110A). Let e be a purely gen- 
eral sentence containing the primitive predicates ‘P’ and ‘R’ of degree one and 
two, respectively, and saying that the relation R constitutes a one-one corre- 
spondence between those individuals which have the property P and those 
which have not. Obviously, if ¢ is true, P and non-P must have the same cardi- 
nal number, which is impossible if V is an odd number. Therefore, in every ty 
with an odd N, e is L-false and hence m(e) = o; but in every ly with an 
even N, e is factual and hence m(e) > o. We shall find later that these positive 
m-values converge toward the limit o with increasing N. Let h be any factual, 
molecular sentence such that c(h,e) = ¢(h,t) (hence e is initially irrelevant for h, 
D65-2d) = m(h) (T57-3) = r, where r is constant, i.e., independent of W and 
o<r<t, On the basis of D3, c(h,e) has no value in the odd systems, but in 
all even systems its value is always r. Thus, for the even systems we have an 
infinite sequence with the constant value r; hence its limit is r; therefore, 
in læ, c(h,e) = r. On the other hand, the situation is quite different with D3’. 
Here, in every odd system, c(h,e) = 1, while in every even system, as before, 
c(h,e) = r < 1. Thus, here, c(h,e) oscillates between two constant values and 
hence has no limit; therefore, c(#,e) has no value in 2..] 

The convention that no value is assigned to the function ¢ in the case of an 
L-false evidence seems generally accepted, if the point is at all discussed. We 
find it, for instance, in Keynes’s theory of probability ([Probab.], p. 116) and in 
Hosiasson’s theory of degree of confirmation ((Confirmation], p. 133). Some au- 
thors even of more exactly formulated modern systems fail to make a conven- 
tion either way; this, however, leads to contradictions (see § 62). 


B. Deductive and Inductive Logic 


D3 may help us to see clearly both the similarity and the difference 
between deductive and inductive logic. The fundamental concept of de- 
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ductive logic is L-implication. Hence an elementary sentence of deductive 
logic has the form ‘e L-implies 4’. This sentence holds if and only if R, (the 
range of e) is entirely contained in Ra. On the other hand, the fundamental 
concept of inductive logic is degree of confirmation. Hence an elementary 
sentence of inductive logic has the form ‘c(#,e) = 7’. This sentence says, 
according to D3, that mesh = r. We may regard m(e) as the measure 
assigned to Re» Then m(e . 4) is the measure assigned to Re. k); this class 
is N(e) © N(A), in other words, that part of R(e) which is contained in 
Na. Thus, for example, ‘e(h,e) = 3/4’ says that not the whole of R, is con- 
tained in Ra but only a part of it which, measured by m, is three-fourths 
of R.. This is shown in the accompanying diagram, where the areas rep- 
resent the ranges of the sentences. 


Deductive Logic ` Inductive Logic 
ʻe L-implies X means that the ‘c(h,e) = 3/4 means that three-fourths of 
roage a e is entirely contained in the range of ¢ is contained in that of h. 
that of h. ? 


Thus both deductive and inductive logic concern relations between the 
ranges of sentences. The range of a sentence is independent of any facts, 
dependent merely upon the meaning of the sentence as determined by the 
semantical rules of the language system in question. If these rules are 
given, then both the relations studied in deductive logic and those studied 
in inductive logic can be established; no knowledge of facts (that is, extra- 
linguistic, contingent facts) is required, This characterizes both theories 
as branches of logic. Deductive logic deals with the relation of total in- 
clusion between ranges. Inductive logic deals with the relation of partial 
inclusion between ranges, so to speak, partial L-implication. Therefore, 
inductive logic (here always meant as quantitative inductive logic) re- 
quires the introduction of a new concept, a numerical, additive measure 
function for the ranges. This may be illustrated by an analogy with geome- 
try. A sentence like ‘The whole of Illinois is contained in the United 
States’ expresses a purely topological relation and hence does not require 
a measure function. On the other hand, a sentence of the form ‘Three- 
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fourths of Illinois lies north of 39° lat.’ presupposes a measure function 
for geographical areas. 

We can now see more clearly what the definition of c by D3 amounts 
to if we remember what the range of a sentence is. N, was defined as the 
class of those 3 (state-descriptions) in which the sentence e holds (Di8- 
6a). The 3 describe the possible states of the domain of individuals 
(§ 18A). Suppose the whole knowledge which an observer X has gained 
by observations of the individuals is expressed by the sentence e. Then 
all he knows is that the actual state of the domain of individuals is one 
of those described by the 3 in §.; but he does not know which one of these 
it is. Suppose now that X is interested in a certain hypothesis k; he wants 
to obtain a judgment about # on the basis of his knowledge. For this 
purpose, he examines the relation between the range of e and that of h. If 
he finds that R, C 9, (in other words, e L-implies 4), then the one 3 
which describes the actual state belongs also to 9, and hence / must like- 
wise be true. If he finds that 9, lies entirely outside of Ra, then the one 3 
which describes the actual state cannot belong to R, and hence / must be 
false. In these two cases, X has used deductive logic and has thereby ob- 
tained a definitive judgment about h. [The certainty of X’s judgment 
about h, either positive or negative, is of course not absolute but only 
relative to e; that is to say, he knows either k or ~h with at least the 
same degree of certainty as he knows e.] If, however, he finds that only 
a part of the range of e is contained in that of h, then he must use induc- 
tive logic. In this case, as long as he does not make new observations 
beyond those expressed in e, he cannot find certainty concerning k; he 
can only determine a probability, a degree of confirmation of # on the 
evidence e. He knows that the actual state is described by one of the 3 in 

e (and is hence represented by one of the points in the area e in the 
right-hand diagram above). If now the actual state were described by 
one of those 3 which belong to the part of R, contained in Ra (and hence 
were represented by a point in the shaded area of the diagram), then k 
would be true; otherwise, % would be false. Therefore, the larger the part 
of N. overlapping with R is in relation to the whole of R., in other 
words, the more of those possibilities which are still left open by e are 
such that / would hold in them, the more reason has X, who knows e, for 
expecting / to be true. Thus the definition of ¢ by the quotient in D3 be- 
comes plausible. [One might perhaps think at first that the introduction 
of a special measure function m for the Tanges was not necessary, that we 
could simply take the quotient of the number of 8 in R(e. 4) by the num- 
ber of 3 in R.. There is such a quotient because here, where we speak of 
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the finite systems Ly, the number of 3 is finite. However, this definition 
would not yield an adequate explicatum for probability, This definition 
would amount to taking that regular m-function which has the same 
value for all 3. We shall later discuss this m-function mj and the c-func- 
tion ct based upon it; then we shall find that the latter is inadequate (see 
§ 110A). Therefore, in order to obtain an adequate explicatum, we must 
not simply count the 3 but, so to speak, weight them; in other words, we 
must find a distribution of m-values among the 3 which does not assign 
equal values but is nevertheless not arbitrary. This constitutes one of the 
main problems we shall have to solve.] 

On the basis of an idea of Wittgenstein ({Tractatus] *5.15), Friedrich Wais- 
mann ({Wahrsch.], pp. 236 £.) gives a definition of probability as a quotient of 
measures of ranges, like our D3. (For the requirements which Waismann, lays 
down concerning the measure function see below, § 62,) Our explanation above 
of L-implication as inclusion of ranges and cas partial inclusion of ranges is in its 
essential features suggested by Waismann’s discussion. Thus the foundations of 
our inductive logic are in complete agreement with his conception. However, our 
further construction of inductive logic seems to differ in the following point from 
Waismann’s plan, which is only indicated but not carried out. He says that the 
choice of the measure function is to be made in such a manner that “we obtain 
accordance with statistical experience” (op. cit., p. 242). In our theory, on the 
other hand, the choice of an m-function is regarded as a purely logical ques- 
tion; we shall later define a certain m-function m* as basis for inductive logic. 
According to our conception, the empirical knowledge of facts enters inductive 
logic only at one point, viz., as formulated in the evidence e; but it cannot de- 
termine the definition of c. (Compare, however, the following discussion.) 


Some philosophers seem to have feelings against choosing a measure 
function m once for all, independently of our experiences, so to speak, 
a priori. They believe that it would be more in accord with the scientific 
method or with the principle of empiricism if the measure function were 
to change with the accumulating experiences (compare the above refer- 
ence to Waismann). I think our method is in perfect accord with em- 
piricism and, in particular, with the requirement that inductive procedure 
should be based on our empirical knowledge e. This requirement is ful- 
filled because in our theory the value of ¢ is dependent upon e. If someone 
wishes c to be based upon a measure function for the ranges which changes 
with the changing experiences, his wish can be fulfilled within our theory 
in the following way. Suppose that a regular m-function m is chosen; how- 
ever, it is now regarded not as the basic measure function but merely as 
a calculatory convenience. As the basic measure function we take now a 
new function m, which is defined with the help of m as follows, first for 
the 3 in Qy with respect to any non-L-false e: 
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(©) a. If e does not hold in 8; m.(3,) = o. 
b. Tf e holds in 3; m,(3,) = BB | 


Thus m, is dependent upon e and hence fulfils the requirement that its 
value change with the changing experiences. Then m,( j) for any other 
sentence 7 in Vy is defined (in analogy to D2) as the sum of the values 
of nu for the 3 in N; or as o if j is L-false. Then the function ¢’ is simply 
defined as follows: 


(2) For any pair of sentences e,h in £y, where e is not L-false, ¢’(h,e) =p: 
me(h). 


It is easily seen that this function ¢’ coincides with the function c based 
upon m according to our method (D3); hence the method just outlined 
is eSsentially the same as our method and differs from it only in the form 
of representation. 

Proof. ¢'(hye) = me(h) = 2[m(3,)]/m(e), where the sum extends over those 


3 in Ra in which e holds, hence the 3 in Re. h). Therefore the sum is m(e « A). 
Hence the quotient is m(e « h)/m(e) = c(h,e). 


C. Probability, and Probability, 


We have earlier (§ 41D) analyzed the relation between probability, 
and probability. by showing that, under certain conditions, probability, 
may be regarded as an estimate of probability.. This relation explains 
the striking analogy between theorems of the two fields. We are now ina 
position to throw some light on this analogy from a different angle, look- 
ing at the logical forms of the two concepts rather than at their meanings. 
Both concepts may be represented as quotients of measures of certain 
classes. This was shown for probability, by the above diagrams, We can 
now use the same diagrams for representing probability, if we give them 
a new interpretation, For the sake of simplicity, let us here consider a 
finite domain of individuals, The largest rectangle in each of the two dia- 
grams is now taken as representing, not the class of the 3 in Ly, but the 
class of all N individuals dealt with in Ry. Let ‘M? and ‘M? designate 
two factual properties, say, Swan and White, respectively. Let the rec- 
tangle marked by ‘e’ in each diagram now represent the extension of M, 
(the class of swans), and that marked by ‘#’ the extension of M, (the class 
of white things). Then the left-hand diagram shows the situation where 
all swans are white, and the right-hand diagram shows the situation 
where three-fourths of the swans are white. Now, the probability, of Ma 
with respect to M, (the probability, of a swan being white) is the rela- 
tive frequency nc(M, . M,)/; nc(M;), where ‘nc(...)’ stands for ‘the 
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cardinal number of . . .’; in the diagram this probability, is 3/4. We see 
a perfect analogy between this definition for probability, and that given 
in D3 for ¢ or probability. Consequently, the theory of probability, 
contains theorems analogous to those for probability, (c) which we shall 
base on D3 (among them also theorems of multiplication, addition, and 
division analogous to those we shall state in the following sections). How- 
ever, there are the following important differences between the two con- 
cepts of probability. 1. In the case of probability,, there is an infinite 
number of regular c-functions based on an infinite number of regular m- 
functions; in order to obtain numerical c-values, we have to choose one 
of these functions. On the other hand, in the case of probability., we take 
simply the cardinal numbers in order to find the relative frequency (or its 
limit, if the domain of individuals is infinite). [It is only in cases of a 
special kind that also for probability, a measure function must be chosen. 
This becomes necessary if not only properties like M, and M, are in- 
volved but physical magnitudes with a continuous realm of values; this 
occurs only in languages essentially richer than our systems g. The tradi- 
tional term for this case is ‘geometrical probability’, because in the earliest 
examples of this kind the magnitudes involved were spatial extensions.] 
2. If a factual property M; is given, then the questions as to which things 
have this property and what is their number are factual questions; the 
answers are to be found empirically, by observations of the things in- 
volved. Therefore, the statement of a probability, value for two given 
properties is a factual statement. On the other hand, the question as to 
which 3 belong to the range of a given sentence e is a logical, not an em- 
pirical, question; because, in order to answer it, it is sufficient to under- 
stand the meaning of e, technically speaking, to know the semantical 
rules for e; we need not know the facts referred to by e. And further, if a 
function m is defined, then, on the basis of its definition, we can determine 
the value of m(e), again without knowledge of facts. Thus, according to 
D3, we find the value of c(#,e) in a purely logical way. Hence statements 
of probability., in contradistinction to statements of probability., are not 
factual but purely logical. This difference has been indicated earlier (in 
§ 10); now we have a clearer understanding of the situation, since we 
know now how the ranges are determined by the semantical rules and the- 
values of probability, are determined with the help of the measures 
ascribed to the ranges. The comparison of probability, and probability. 
may be summed up as follows. Both concepts may be regarded as express- 
ing a numerical ratio for the partial inclusion of one class in another. For 
probability, the two classes are, in general, determined by factual prop- 
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erties; therefore the value is found empirically. For probability, the two 
classes are ranges of sentences and hence determined logically; therefore, 
the value is found in a purely logical way. 


§ 56. Regular Functions for the Infinite System 


In accordance with an earlier consideration (in § 54A), we define now m(. j) 
for læ as the limit of its values in the finite systems (Dx), and analogously 
c(h,e) for Qœ as the limit of its values in the finite systems (D2). 


We have so far applied the concepts of regular m- and c-functions only 
to the finite systems £y. Now we shall apply them to the infinite system 
Rao. In accordance with our previous considerations (in § 54A), we define 
the values of those functions in £e as limits of the values in finite systems 
(Dz and D2). In what follows, ‘lim(. .)’ is always meant, unless otherwise 
indicated, as short for Ha .)’. (For the definition of this concept, see 

->o 


D40-6a.) 


+D56-1. Let m, 2m, etc., be a sequence of regular m-functions for 
the sentences in &,, &,, etc. m is the regular m-function for the sentences in 
& corresponding to this sequence =p; for every sentence j in Yo for 
which the limit exists, m(j) = lim wm(j); if the limit does not exist, m has 
no value for 7. 


+D656-2. Let ıc, +c, etc., be a sequence of regular c-functions for &,, 2, 
etc. cis the regular c-function for Qo corresponding to this sequence = pf 
for any pair of sentences k,e in lo for which the limit exists, c(h,e) = 
lim ye(h,e); if the limit does not exist, c has no value for k,e. 


Instead of ‘m for lo’ and ‘c for Qw’ we shall sometimes write ‘em’ and 
‘oc’, respectively. 

Now let us see what is stated by D1 and D2. Suppose a sentence j in 2o 
is given. Then j occurs also in finite systems; suppose it occurs in &n, then 
it occurs likewise in every 2, where n > m. To j as a sentence in lm, a 
certain real number r,, is assigned as its ,m-value; likewise, to jasa 
sentence in 2,41, a real number rm4: (not necessarily different from rm) 
as its m4:m-value, etc, Now, Dx says this: if the sequence of the num- 
bers fm, mtis fma, Etc., possesses a limit 7, then this number 7 is taken 
as om(j), i.e., as the value of the function m ascribed to j as a sentence 
in lo. The situation with D2 is analogous. Suppose the sentences / and e 
in le are given. Then h and e occur also in a finite system, say, %,; and 
hence also in m+n, &m42, etc. Then nc(h,e) has a value in all those systems 
of this sequence in which e is not L-false (T55-2b). Now D2 says this: if 
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the sequence of these c-values possesses a limit g, then g is taken as 
«¢(h,e), i.e., as the value of the function œc for # and e as sentences in fo. 


We shall briefly indicate the reason for our decision to base œm on the se- 
quence of the functions ym (i.e., xm, am, etc.) and analogously œc on the se- 
quence of the functions ye. Without closer examination, one might perhaps 
think that om could be directly introduced as a function for the 3 in lo in 
analogy to Dss-r and then extended to the sentences in lœ in analogy to D55-2, 
ie., by defining m(j) as the sum of m(3,) for all 3 in %;. However, this 
procedure is not possible for the following reasons. First, the number of 3 in l% 
is not only infinite but nondenumerable; it is a, (T29-4c), hence equal to the 
cardinal number of the continuum (T40-25b). Second, for many sentences j in 
Ra, the number of 8 in Ky is infinite; in many cases, for instance if j is any 
non-L-false molecular sentence, the number is again ar. It would be possible 
to introduce the m-functions as additive measure functions (in the sense of 
point-set theory) for those classes of 3 in fœ which are ranges of sentences in 
Ro (see the remark following Ds8-r). In accordance with the intended mean- 
ing of universal sentences, these functions should fulfil the following condition. 
Let i be a universal sentence (e.g., ‘(x)(Mz)’); let jn be the instance of its 
scope with ‘a,’ (‘May’); let kn be the conjunction jı «jas... =Jnj then m(i) = 
lim m(ks) (for n + œ). This, however, is not essentially different from our 
present procedure using the systems w, as can be seen in the following way. 
In our procedure we define œm(i) as lim ym() (D1). Now, 7 is L-equivalent 
in Qy to ky. Therefore, wm(i) = wm(kw). Hence, om) = lim wm(ky). Fur- 
thermore, the procedure indicated would involve certain additional complica- 
tions, because if we were to define (directly in tœ) c(h,e) simply as m(e « h)/m(e), 
which seems the most natural way, then this procedure would have the serious 
disadvantage explained below for W’ (see: Discussion of an alternative pro- 


cedure). 


Let ym, om, ye, and œt be regular. Although ym has a value for every 
sentence in ly (D55-2), we see from Dr that =m does not necessarily 
have a value for every sentence in fo. we has a value for sentences hande 
in y only under a certain condition (T55-2b). A still stronger condition 
must be fulfilled for #c to have a value. The following theorems state the 
domains of the functions em and oC, i.e., the conditions which the argu- 
ments must fulfil in order to assure values for the functions. 


T56-1. Let ym (N = 1, 2, etc.) be a sequence of regular m-functions 
for sentences and om the regular m-function for sentences in o corre- 
sponding to the sequence. Then, for a sentence jin Wo, om has a value 
if and only if the sequence of the numbers ym(j) is convergent (D4o-6b). 


T56-2. Let we (N = 1, 2, etc.) be a sequence of regular c-functions 
based on the functions ym, and oc the regular c-function for Qo corre- 
sponding to the sequence. Then; for a pair of sentences h,e in lo, o¢ has 
a value if and only if the following two conditions are fulfilled. 


_— = 
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a. Those W for which ym(e) = o (and hence e is not L-false in Qy) 
form an infinite sequence. 

b. For these W, the sequence of the numbers we(h,e) is convergent. 
(From T55-2b, D2.) 


Discussion of an alternative procedure. On the basis of a function ym for the 
sentences in fy, we is defined (D55-3 and 4). On the same basis, œm is defined 
(D1). Then there are two possible ways for defining œt. The way W, which 
we have chosen, is expressed by D2; here, œc is defined as the limit of ye, in 
analogy to Dr. The alternative way W’ would consist in defining wc not on the 
basis of ye but on the basis of om; in analogy to Ds5-3 and 4, ac(h,e) would 
be defined as œm(e « h)/com(e). In all cases where W’ assigns a value to œc, our 
way W yields the same value (see below, T4a). On the other hand, W’ does not 
assign a value in certain cases where W does. For W’ to assign a value to «¢(h,e), 
it is required that .m(e) = o, in analogy to Ts5-2b. The requirement in the 
case of W, which has been stated in T2, is weaker. For there may be sentences e 
of the following kind (for our function m*, any sentence of the form ‘(x)(Mx)’ 
would be an example, see the remark following (5) in § 110A). e is factual 
both in the systems £y and in œ; nevertheless, wm(e) = o because the se- 
quence wm(e), although consisting of positive numbers, converges toward the 
limit o (as, e.g., in T40-22). (In this case we shall later call e almost L-false, 
Ds58-1b.) If e is of this kind, the procedure W’ assigns no value to «c(h,e). 
On the other hand, for every N, we(h,e) has a value; and if the sequence of 
these values has a limit, say, 7, then, on the basis of W, i.e., D2, oc(h,e) has 
the value z. For instance, for every N, we(t,e) = 1 (T '59-1d), and hence œc(ż,e) = 
1. This feature of W’ is-a serious disadvantage. 


T56-4. Let wm (N = 1, 2, etc.) be any sequence of regular m-functions 
for the 3. Let the following functions be defined on this basis: the se- 
quence of functions wm for sentences (D55-2), the function «m (D1), the 
sequence of functions we (D55-3 and 4), and the function »¢ (D2). 
a. For any sentences # and e in læ, if om has values for e«h and for e i 
and the latter value is not o, then 


eoc(h,e) = e. 


Proof. Let the conditions be fulfilled. Then there is an m (Tx) such that, 
for every N > m, wm(e) > o and ne(h,e) = wm(e«h)/wm(e) (D55-4 and 3), 
hence we(h,e) X wm(e) = wm(e.h). Therefore lim (we(h,e)) X lim (wm(e)) = i 
lim (wm(e « h)) (T40-21c), and hence œc(h,e) X œm(e) = om(e.h) (D2, D1). | 
Since the second factor is assumed to be positive, the theorem follows. 


b. If e is L-false in 20, then ~c(%,e) has no value. 


3 Proof. If the condition is fulfilled, there is an m such that, for every N > m, 
e is L-false in 8y (T20-11b) and hence we(h,e) has no value (T55-2b). Hence 
the theorem (T2a). 


In connection with T4a, it is to be noted that, as mentioned earlier, 
œc(4,e) has sometimes a value even in cases where œm(e) = o; here, also, 
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om(e. h) = 0, and hence the oc-value cannot be represented as a 
quotient of the two »m-values. (See the discussion preceding T4.) 

For the sake of simplicity, the definitions and theorems concerning m- 
and c-functions are formulated in this book only for sentences as argu- 
ments. They can, however, easily be extended so as to apply also to 
classes of sentences. The definitions of ‘range’ (D18-6b) and of the L-con- 
cepts (D20-x1) are applicable to classes of sentences. In a finite system fy, 
the m- and c-functions can be applied in a simple way to classes of sen- 
tences. For every such class there is a sentence representing it in the sense 
of being L-equivalent to it (T21-8e); hence we can define the values of 
those functions for any given classes by their values for the representing 
sentences. In the infinite system fo, this simple procedure cannot be 
used; but the same aim can be achieved by other means. 


A procedure for lœ may briefly be indicated. If a class of sentences Ñ; in 
Zo is given, we define the corresponding class w&; in fy as the class of those 
sentences of §; which occur in &y. Then we define om(Ñ;) as the limit of 
wm(y8;), and analogously for œc. Let &; be the class of all full sentences of ‘M’ 
in co. Then the corresponding class 7; contains the full sentences of ‘M ’ with 
‘a,’ through ‘ay’. In this case, there is a sentence in 20 L-equivalent to Ki, 
viz., ‘(x)(Mx)’; let this be i. This same sentence ż in fv is L-equivalent to the 
corresponding class wf; in Qy. Thus in a case of this kind the new definition 
for m(&;) is in accord with the old definition in this sense: m has for &; the same 
value as for the sentence é representing Ks. Therefore it seems plausible to ac- 
cept the same definition also for any class for which there is no L-equivalent 
sentence. 


§ 57. Null Confirmation; Fitting Sequences 


A. Some fundamental theorems concerning regular m-functions are stated 
(Tx). 

B. The confirmation of a hypothesis j on the tautological evidence ‘’, in 
other words, the confirmation of j before any factual knowledge is available, is 
called the null confirmation (or initial confirmation) of j, in symbols: ‘co(j)’ 
(Dz). It turns out that co coincides with m (T3). 

C. Suppose we have a sequence of regular m-functions xm, am, etc., one for 
each finite system. If these functions fit together in a certain sense, the se- 
quence is called a fitting m-sequence (D3, D4). And the sequence of regular 
c-functions based upon those m-functions is called a fitting c-sequence (Ds). 
That the functions of a fitting sequence actually fit together is shown by the 
following results: for a nongeneral sentence, all m-functions of the sequence have 
the same value (T5); and for a pair of nongeneral sentences, all c-functions 
of the sequence have the same value (T6). 


A. Theorems on Regular m-Functions 


The theorems in Tx concern regular m-functions for finite or infinite 
systems. These theorems serve chiefly for two purposes: (i) as lemmas for 
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later theorems concerning regular c-functions, and (ii) as theorems con- 
cerning the null confirmation ¢,, since this function coincides with m (T3). 


T67-1. Let ym (N = 1, 2, etc.) be a sequence of regular m-functions, 
and om the corresponding regular m-function for 22. Then the follow- 
ing assertions (a) to (x) hold both (r) if m is any ym (with respect to any 
sentences in y), and (2) if m is om (with respect to any sentences in fo 
for which om has a value, compare T56-1). The proofs or references in- 
dicating proofs concern ym under (1), and om under (2). (Concerning the 
use of ‘P, see the remark preceding T2o0-1.) 


Fa 


+b. 


+p. 


If i=j, then m(i) = m(j). (1. From D2o-1d, D55-2. 2. From 
T20-11, T40-21e.) 

If i is L-false, m(z) = o. (1. From D55-2a. 2. From T20-11b, T40- 
21e.) (For a restricted converse of this theorem, applying to all sen- 
tences in Qy and to the nongeneral sentences in lo, see T58-1a.) 
m(~t) = o. (From (b).) 


. Ifzis L-true, m(¿) = 1. (1. From D20-1a, D55-2b, Ds5-1b. 2. From 


T20-11a, T40-21e.) (For a restricted converse see T58-1c.) 


. m(é) = 1. (From (d).) 
. If o < m(@) <1, then 7 is factual. (From (b), (d).) (For a re- 


stricted converse see T58-re.) 


. o S m(é) < 1. (1. From D55-2 and r. 2. From D56-1.) 
- If }¢ D j, then m(Z) < m(). (x. From D2o-1c, D55-2, D55-1a. 2. 


From T20-11, T40-21f.) 


. m(ż.j) < m(i). (From (h).) 
. m(z) < m(żV j). (From (h).) 
. mV j) = m) + mG) — mü. j). 


Proof. 1. RGV j) is Re O R; (T18-1f). Therefore, ym(i V j) = Zym(8) for 
the 3 in Ri V R; = Zym(8) for the 3 in R: plus Zym(B) for the 3 in R; 
minus Zym(3) for the 3 in R: R; (the latter sum must be subtracted be- 
cause these 3 belong both to R; and to R; and hence their ym-values are counted 
we in the first two terms). Hence theorem (with T18-1g). 2. From T40-214 
and b. 


» m7) = m(i) + mG) — mG V j). (From (k).) 
+m. 


If m(¢. j) = o, hence in particular (from (b)) if i and j are L-exclu- 
sive (in other words, 7.7 is L-false, |i D ~j), then m(i Vj) = 
m(i) + m(7). (From (k), (b).) 

See v. 

m(~i) = 1 — m(i). (1. From T18-1e, Ds5-2, Dss-1. 2. From (1), 
T40-21b.) 


. If m@.7) = 1, then m(é) = m(j) = r. (From (i), (g).) 
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m(i) = m(i. j) + m(@. ~j). (From T21-5j(2), (a), (m).) 
If m(i) = o, m(é.7) = o. (From (i).) 
If m(i V j) = 1, hence in particular (from (d)) if ¢ and j are L-dis- 
junct (i.e., i V j), then m(ż. j) = mi) + mQ) — 1. (From (1).) 
u. If m(i. ~j) = o, hence (from (p)) m(~i VJ) = 1, then 

(1) m@) = m(@.J) (from (r)); 

(2) m@) < mG). (From (1), ().) 
v. Letj bej: V ja V...V jn (# 2 2). For any two different components 
jm and jp, let m(jm=Jp) = 0. (This is the case in particular ifj.,..., 
jn are L-exclusive in pairs (D20-1g).) Then 


m(j) = mGa). 


Proof. 1. The assertion holds for n = 2 (m). 2. For n > 2, let us assume 
that the assertion holds for n — 1; we shall show that then it holds likewise 
for n. Let j’ be jr VjaV... Vinx J' «Jn is L-equivalent to (j: « jn) V (anja) V 
<.. V (n= s Ĵa) (T21-5m(2)). For every component of the latter disjunction 
m = o, hence also for the conjunction of any two of these components (s). 
The disjunction has » — 1 components. Therefore, according to our assump- 
tion, m(j’ »jn) is the sum of the m of the components, hence o. Therefore, 
since j is j’ V jm m(j) = mG’) + mGa) (m). Again according to our assump- 


tion, mG’) = Dy m(j,). Hence m(j) = > mG). The assertion for every n 2 2 


follows from & and (2) by mathematical induction. 


O e 


w. If m(i) = m(j) = o, then m(Z Vj) = o. 

Proof. m(i «j) = o (i). Hence the assertion with (k). 
x. If m(i) = m(j) = 1, then m@.J) = 1. 

Proof. m(i \ j) = x (j). Hence the assertion with (1). 


B. Null Confirmation 

In accordance with an earlier preliminary explanation (in § 54B), we 
shall now introduce the symbol ‘co for the null confirmation or initial 
confirmation, i.e., the confirmation on the tautological basis ‘f (D1). 
Then it can easily be shown that Co coincides with m (T3); hence Tx ap- 
plies also to co. 

+-DB7-1, Let c be a regular c-function for a finite or infinite system £. 
For every sentence 7 in 2, 

co(j) =o: (Jit) « 

4-T67-3, Let m be a regular m-function for the sentences of a finite or 

infinite system £. Let c be the corresponding regular c-function for £, and 
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co the corresponding null confirmation (Dr). Then, for every sentence j 
in € (in la, provided m has a value for 7) 
co(j) = mG) . 
Proof. co(j) = c(j,t) (D1), = m(t«j)/m() (T55-2a for 8w, Ts6-4a for Ro), 
= m(j) (T21-5s(1), Tra, Tre). 

The concept of null confirmation is very important for inductive logic. 
co(%) is the degree of confirmation of the hypothesis # on the evidence ‘!’ 
and hence on any L-true evidence, in other words, the degree of confirma- 
tion of # before any factual information is available. This concept may 
look suspicious at the first glance; one might perhaps think that, as long 
as we have no factual knowledge, we have no right to say anything about 
h. An objection of this kind, however, is based on a misconception of the 
nature of c and of its explicandum, probability,. As earlier explained 
(§ 10A), a statement of probability, is, if true, L-true, not factual; hence 
the same holds for a sentence of the form ‘c(h,e) = r’. This sentence is 
a semantical and, more specifically, an L-semantical sentence which 
states a logical relation between the sentences + and e of the object lan- 
guage but does not say anything about facts. In this respect it has the 
same nature as an L-semantical sentence of the form ‘e L-implies X’. Con- 
sequently, a sentence of the form ‘co(#) = g’ states a certain logical prop- 
erty of the sentence / without saying anything about facts. More specifi- 
cally, it states a numerical value of a purely logical function for the argu- 
ment %; this value is dependent only upon what is usually called the mean- 
ing of % or, in more technical terms, the range of h as determined by the 
semantical rules of the language system £ to which %4 belongs; it is not 
dependent in any way upon the contingency of facts, e.g., upon the ques- 
tion whether 4 is true or false. We shall later come back to the question 
of the legitimacy of the concept of null confitmation (§ 108). 

Earlier authors in the theory of probability, have sometimes used the 
term ‘probability a priori’ for the concept under discussion. Later authors 
have preferred to avoid this term because of its ambiguity and to use in- 
stead other terms, e.g., ‘initial probability’ (‘Anfangswahrscheinlichkeit’). 
The term ‘probability a priori’ and its counterpart ‘probability a posteri- 
ori’ have been used in at least three different meanings. 1. ‘Probability 
a priori’ for confirmation by L-true evidence, ‘probability a posteriori’ for 
confirmation by a factual evidence. 2. In theories of probability based on 
the principle of indifference, the term ‘probability a priori’ is often used 
in cases where the probability is calculated chiefly with the help of that. 
principle, even if factual knowledge is used, provided this knowledge is 
not of a statistical nature. On the other hand, a probability is called ‘a pos- 
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teriori’ if it is calculated chiefly on the basis of statistical information 
(reports of experiments, social statistics, and the like). In particular, with 
respect to results of games of chance, ‘probability a priori’ is used if the 
evidence gives information only about the general conditions of the game 
(e.g., symmetry of a die or roulette, physical similarity of cards, and the 
like), while ‘probability a posteriori’ refers to evidence including statisti- 
cal results of earlier games. 3. Let # be a law or another hypothesis, and 
i a prediction concerning the result of a new experiment which we plan 
to make in order to test 4; let e express the knowledge we have before we 
observe the result of the new experiment; e may include the results of any 
number of previous observations relevant to 4. Then ‘probability a priori’ 
is sometimes used for c(h,e), and ‘probability a posteriori’ for c(/,¢. i). 
(We shall later use the terms ‘prior confirmation’ and ‘posterior confirma- 
tion’ instead; see § 60.) It seems to me that, if the two terms are to be used 
at all, (1) is the only appropriate use, because the only one in accordance 
with the customary, Kantian meanings of the terms ‘a priori’ and ‘a pos- 
teriori’. The usages (2) and (3) should be avoided; the use of ‘probability 
a priori’ in these cases, when the evidence is factual and empirical, is 
quite misleading. 

It must be admitted that some earlier authors have violated the prin- 
ciple of empiricism by certain statements concerning probability a priori. 
Other authors were right in criticizing these statements (cf. §§ 41D and 
42B). The decisive point for our present discussion is the fact that the 
violation is to be blamed not on the concept ce itself but on its misuse. 
This misuse was chiefly due to a lack of distinction between probability, 
and probability,. There is no analogue to co for probability., since rela- 
tive frequency has no value within the null class. 


C. Fitting Sequences 

The remainder of this section deals with a technical problem which has, 
however, no fundamental significance. We have found earlier that m- 
functions (or ¢-functions) for different finite systems must fulfil a certain 
requirement in order to fit together (§ 54E(6)). We now call a sequence 
of regular m-functions for 3 for all finite systems a fitting m-sequence if 
they fulfil this requirement (D3). The concept of a fitting m-sequence for 
sentences is then defined on this basis (D4), in analogy to our earlier ex- 
tension of m-functions for 3 to m-functions for sentences (Ds5-2). Final- 
ly, the concept of a fitting c-sequence is defined on this basis in an obvious 
way (Ds), in analogy to D55-4. (In what follows, the 3 in Qy are desig- 
nated by ‘v3’, and the ranges in fy by ‘w®’.) 
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D57-3. A sequence of functions ,m, 2m, ;m, etc., is a regular fitting se- 
quence of m-functions for 3 or, briefly, a fitting m-sequence for 3 =p: 
the following two conditions are fulfilled. 
a. For every N (= 1, 2, etc.), ym is a regular m-function for the 
3 in Ly. 

b. For every N and every 73; (i-e., 3; in ly), ym(v3,) is equal to the 
sum of the y+,:m-values for all those w+:3 of which y8; is a sub- 
conjunction. 


D67-4. A sequence of functions ,m, am, m, etc., is a regular fitting se- 
quence of m-functions for sentences or, briefly, a fitting m-sequence for sen- 
tences = ps there is a fitting m-sequence for 3, m’, an’, etc., and, for every 
N, wm is a regular m-function for the sentences in £y such that, for every 
non-L-false sentence j in ly, wm(j) is the sum of the ym’-values for the 
3 in Rj. 


D67-5. A sequence of functions ,¢, .¢, ;¢, etc., is a regular fitting se- 
quence of confirmation functions or, briefly, a fitting c-sequence =p} 
there is a fitting m-sequence for sentences ,m, .m, etc., such that, for every 
N, nc is the regular c-function based upon ym (Ds55-3). 


The following theorem T5 says that, with respect to a fitting m-se- 
quence, any nongeneral sentence j has the same m in all systems £. This 
is as it should be, since a nongeneral sentence has the same meaning in all 
systems (§ 15A). Thus, Ts shows that D3 and Dg are adequate, i.e., 
that these definitions define indeed the “fitting together” of m-functions 
for different systems in the sense intended. It is then easily seen that D5 
is likewise adequate, i.e., that it effects the “fitting together” of c-func- 
tions; this is stated in T6. 


T57-5. Let m, 2m, etc., be a fitting m-sequence for sentences, and om 
the corresponding regular m-function for Qo. Let j be a mongeneral sen- 
tence in £y, and hence also in w41, in lwm for any m, and in lo. 

a. ym(j) = ysxm(j). 


Proof. 1. Let j be L-false in Qy. Then it is likewise L-false in Qy: (T20-8b). 
Therefore, in both systems, m = o (D55-2a). 2. Let j be not L-false in fw. 
Then it is not L-false in v+: (T20-8b). Therefore, neither y9; (i.e., the range 
of j in 8y) nor NN; is null. w4:%; is the class of those v4:3 of which the 
sentences in yt; are subconjunctions (T19-8); and for every N8 in w4i2iy 
there is just one y8 in w; which is a subconjunction of it. Let y8; be any 3 in 
NR; Then 2y4.m() for those y4:3 which contain y8; as a subconjunction 
equals wm(w8;) (D3b). Therefore, 2ym(3) for the 3 in wR; equals Dy4.m(3) 
for the v4:3 in NR; wm(j) is the first of these two sums (Ds5-2b), vym) 
is the second. Hence these two m-values are equal. 
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b. ym(j) = wamm(y) for any m. (From (a), by mathematical induc- 

tion.) 
c. om(j) = wm(j). (From (b), T4o-21d.) 
T57-6. Let xc, 2c, etc., be a fitting c-sequence, and œc the corresponding 
regular c-function for lœ. Let e and k be nongeneral sentences in ty (and 
hence in 2y+1, in 2v+m for any m, and in fo), and e be not L-false in fy. 
a. e is not L-false in fy41; and ye(h,e) = wyrc(h,e). (From T20-8b; 
T55-2a, T5a.) 

b. For any m, e is not L-false in Qw4m; and we(h,e) = wymc(/,e). (From 
Tsb.) 

c. eis not L-false in 20; and «c(h,e) = we(h,e). (From T20-10b; (b), 
T4o-21d.) 


§ 58. Almost L-true Sentences 


If i is not L-true but, as for L-true sentences, m(i) = 1, then we call 7 an 
almost L-true sentence (Dra). i is called almost L-false if ¢ is not L-false but 
m(i) = o (Dib, T3a). Sentences of this kind can occur only among general 
sentences in a (T3d, e). The terms ‘almost L-implies’ and ‘almost L-equiva- 
lent’ are defined analogously (Dre, d). 


General sentences (i.e., sentences containing variables) in ly can al- 
ways be transformed into L-equivalent nongeneral sentences (T22-3). 
In £e, however, this does in general not hold. Therefore, certain theorems 
in inductive logic are stated for all sentences in finite systems but only 
for nongeneral sentences in lœ. The following theorem Tx concerning 
m-functions belongs to this kind, and later T59-5 concerning c-functions. 


T68-1. The subsequent assertions (a) to (1) hold under each of the fol- 
lowing two assumptions: (i) Let m be a regular m-function for the sen- 
tences in £y and n (in (h) to (k)) any other such function; let ¢ and j be 
any sentences in Qy. (ii) Let m be a regular m-function for the sentences 
in le (Ds6-1) corresponding to any fitting m-sequence for sentences 
(Ds7-4), and m (in (h) to (k)) any other such function corresponding to 
any other such sequence; let 7 and j be any nongeneral sentences in Lo. 
(The proofs and references given below are for the assertions in case (i); 
the assertions in case (ii) follow with the help of T20-10 and T57-5¢.) 


a. Tf i is not L-false, m(¿) > o. (From D2o-1b, Ds5-2b, D55-1a.) 
+b. m(i) = o if and only if 7 is L-false. (From D55-2a; (a).) 

c. If dis not L-true, m(¿) < 1. (From Dzo-1a, D55-2, D5 5-1.) 
+d. m() = 1 if and only if å is L-true. (From T57-1d; (c).) 

e. If zis factual, o < m(é) < 1. (From (a), (c).) 
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+f. o < m(i) < x if and only if 7 is factual. (From T57-1f; (e).) 

g. If} iD j but not }7 D i, then m(¿) < m(j). 

Proof. Let the conditions be fulfilled. Then R: C R; but not the converse 
(D20-1¢). Thus, all 3 of R; belong to Rj, but there is a 3: in R; which does 
‘ not belong to 9. m(3x) > o (Ds5-1a); hence the theorem (with D55-2). 

h. If m(z) = o, m'(i) = o. (From (b).) 

i. If m() > o, m'(i) > o. (From (h).) 

j. If m(i) = 1, m'(i) = 1. (From (d).) 

k. If m(z) < 1, m'(i) < x. (From (j).) 

1. Let 7 be a factual sentence in 2; hence o < m(i) < 1x (e). Let z be 

an arbitrary real number such that o < 7 < 1. Then there is a regu- 
lar m-function m” for the sentences in € such that m” (i) = r. 

Proof (for Qw). R; is neither empty nor does it contain all 3 (in 2w). We con- 
struct m” as a function for the 8 (D55-1) by distributing r in an arbitrary way, 
e.g., in equal amounts, among the 3 in R, and 1 — r in an arbitrary way, 
e.g., in equal amounts, among the remaining 3. Then we extend m” to a func- 
tion for the sentences (D55-2). Then m’(i) = r. 

Some of the theorems in Tr are restricted converses of theorems which 
hold without restriction for all sentences in 2». In this way, Tra, c, e 
correspond to T57-1b, d, f, respectively. We shall now explain why the 
restriction to nongeneral sentences in {o is necessary. This will give oc- 
casion for the introduction of new terms, the ‘almost-L’-terms. 

As mentioned earlier (in § 56, Discussion of an alternative procedure), 
for a given sequence ym of regular m-functions and the corresponding 
function om the following may happen. (For our function m*, ‘(x) (Mx) 
was mentioned as an example.) There is a sentence e of the following kind. 
e is factual both in lo and in the systems &y; hence, ym(e) > o for every 
N; and, in particular, these positive values are such that they converge 
with increasing W toward o; hence »m(e) = o (D56-1). e has the latter 
property in common with L-false sentences (T57-1b), although e is not 
L-false but factual. We shall call sentences of this kind almost L-false 
(Dib). Because of the existence of almost L-false sentences, Tra and 
hence Trb cannot be asserted without restriction. Since we have proved 
Tra for all sentences in finite systems and for all nongeneral sentences 
in lo, almost L-false sentences can only occur among the general sen- 
tences in le (T3e). If e has the properties mentioned, then ~e is not L- 
true; nevertheless, ~e has the om-value 1 (T57-1p) like L-true sen- 
tences (T'57-1d). We shall call sentences of this kind almost L-true (D1a). 
Because of their existence, Tıc and hence Trd cannot be asserted with- 
out restriction. The concepts of almost L-true and almost L-false sen- 
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tences will prove useful in our system of inductive logic; that is the reason 
why we introduce special terms for them. Other ‘almost-L’-terms will 
be defined analogously (Drc and d), but they are less important. 


D68-1. Let m be a regular m-function for the sentences in fo, and 7 
and j be sentences in fo. 


+a. 


+b. 


i is almost L-true (in Qo, with respect to m) =pr 7 is not L-true 
(in lo) but m(z) =z. 

i is almost L-false (in Qw, with respect to m) =p: ~t is almost 
L-true. 

i almost L-implies j (in Ro, with respect to m) =pr7 D j is almost. 
L-true. 


. iis almost L-equivalent to j (in Qœ, with respect to m) =p: 7 = jis 


almost L-true. 


A remark on the choice of the term ‘almost L-true’. Let m be a regular m-func- 
tion for the sentences in læa. For every sentence i in lœ for which m(i) has a 
value (Ts6-r), let us assign this value to the class R; as its measure. Thereby, 
in the domain whose elements are the 3 in lo, we have defined an additive 
measure function for some classes of elements. A measure function f is called 
additive if the following holds: for any two mutually exclusive classes &; and 
®; for which f has values, f(&: V R) = f(@s) + f(Si). This condition is ful- 
filled for the function described. (Proof. Let i and j be any sentences in Qo for 
which m has values, and let R; and N; be exclusive classes, ie., Ri © Mj, and 
hence R(é . j), is empty. Then i «j is L-false in lo. Therefore f(R: U Ri) = 
m(i Vj) = m() + mG) (T57-1m), = f(s) + f(R;)-] Now, in the terminology 
of mathematics (theory of measure functions, based on set theory), with re- 
spect to the elements of a domain within which an additive measure function 
for certain classes is defined, one says that almost all elements have a certain 
property if all elements have this property with the exception of some whose 
class has the measure zero. According to this usage, if a sentence 7 in lo is 
not L-true but such that m() = 1, we should say that ¢ holds in almost all 3. 
[This is seen as follows. The 8 in which ¢ does not hold are those belonging to 
R(~i) (T18-1e). The measure of R(~i) is m(~i) = 1 — mG) (T57-1p), =0.] 
Since now a sentence which holds in all 3 is called L-true (D20-ra), it seems not 
unnatural to call i, which holds in almost all 3, almost L-true. Analogously, 
m~i fails to hold in all 3 except those of R(~), which is a class of measure zero; 
hence, ~# fails to hold in almost all 3. Thus it seems natural to call ~i almost 
L-false. [The phrase ‘almost all’ is sometimes used in mathematical terminology 
in a somewhat different sense, meaning ‘all elements (of an infinite domain) 
except a finite number of them’. It may be noted that in this sense the phrase 
does not apply to our case. The class of those 3 in which an almost L-true sen- 
tence i does not hold, namely, (~#), is in general infinite; the essential point 
is that its measure is nevertheless zero.) 


T58-3. Let m be a regular m-function for the sentences in fo, andi andj 
be sentences in £». Then the following holds (with respect to m, in fo). 
+a. iis almost L-false if and only if 7 is not L-false but m() = o. (From 


Dıb, T57-1p.) 
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b. ¿almost L-implies j if and only if ¿ does not L-imply j but m(¢ D 7) = 
1 (hence, m(i . ~j) = o). (From Dic.) 
c. zis almost L-equivalent to 7 if and only if z is not L-equivalent to j 
but m(z D 7) = m(j D i) = 1. (From Did, T57-1q.) 
+d. If i is almost L-true, then 7 is general (i.e., contains a variable). 
(From Trc(ii) Í 
+e. If iis almost L-false, then z is general. (From Dıb, (d).) 
f. If iis almost L-true and}i D j, then m(j) = 1, and hence ĵ is either 
L-true or almost L-true. (From T57-rh.) 
g. If j is almost L-false and | D j, then m(z) = o, and hence 7 is either 
L-false or almost L-false. (From T57-zh.) 
h. If z is almost L-true, then 7 is factual in Yo. 
Proof. 1. i is not L-true (Dra). 2. i is not L-false, because otherwise m(i) = o 
(T57-1b). 
i. If 7 is almost L-false, then z is factual in l». (From Dr1b, (h), 
T20-6a.) 


Let us go back to the question why the restriction of T1(ii) to nongen- 
eral sentences in lo is necessary. Let 7 be almost L-true and hence ~i 
almost L-false. Tra to d have been explained earlier. Tre and f hold 
neither for 7 nor for ~i. The following is a counterexample for Tig as 
applied to general sentences. ‘~#’ L-implies ~7, like every sentence (T20- 
2h); but the converse does not hold because ~i is not L-false; neverthe- 
less, for both ‘~? and ~i, m = o. This yields also examples for Dic 
and d. From what has just been said, it follows that ~i D ~t is not L- 
true; but it is almost L-true, because m(~i D ~i) = m(iV ~i) = 
m(z) = 1. Therefore, ~i almost L-implies ‘~?’; and ‘? almost L-implies i. 
Furthermore, since } ~t D ~i, m(~t D ~i) = 1. Therefore, ~i and 
‘~ are almost L-equivalent (T3c); likewise, 7 and ‘?’. 

The following theorems T4a, b and Tsa, b are chiefly of interest if Q 
is lo andj is almost L-false, because otherwise j is L-false (Tb) and then 
the assertions are obvious. Analogously, Tsc, d are chiefly of interest if 
is lo and 7 is almost L-true. 

T58-4. Let m be a regular m-function for the sentences of a system l 
(finite or infinite), and let m(j) = o. 

a. m(j. i) = o. (From T57-1i.) 

b. m(j V i) = m(i). (From T57-1k, (a).) 

T58-5. Let m be a regular m-function for the sentences of &; let z, 7’, j, 
and 7’ be sentences in l such that t’ is formed from 7 by replacing one 
or several (not necessarily all) occurrences of j in 7 with 7’. 
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a. If m(j) = o and j’ is ‘~t’, then m(#) = m(?’). 
Proof. m(i) = mG Vi) (T4b), = mG Vi’) (T23-6b, Ts7-1a), = mG’) (T4b). 


b. If m(j) = m(j’) = o, then m() = mG’). (From (a).) 

c. If m(j) = 1 andj’ is ‘?, then m(@) = m(7’). 

Proof. m(i) remains unchanged if é is transformed as follows. First we re- 
place the occurrences in question of j by ~~j (T23-1b, T57-1a); then, at these 
places, we replace ~j, whose m-value is o (Ts7-1p), by ‘~? (a). Thus, the 
original j is replaced by ‘~~t’; for this, we put P (T23-1b, T57-1a). 

d. If m(j) = m(j’) = 1, then m(@) = m(@’). (From (c).) 

Tsb and d are inductive theorems of replacement, analogous to the 
deductive theorem T23-1b. They say that m(#) remains unchanged if 
any subsentence in é is replaced by any other sentence such that either 
both have the m-value o or both have the m-value 1. While m(z) may 
have any value whatever, for the sentences exchanged it is not sufficient 
that they have equal m, but it is required that their m has one of the two 
extreme values. [The necessity of this restriction is shown by the follow- 
ing counterexample. Let j be a factual sentence such that m(j) = m(~), 
and hence both values are 1/2. (For the function m* to be introduced 
later, this holds, e.g., for every atomic sentence.) Then, m(j V ~j) = 1 
(Ts57-1d); on the other hand, m(j Vj) = m(j) (57-14), = 1/2.] 


§ 59. Theorems on Regular c-Functions\ 


Some theorems concerning regular c-functions are stated, among them fun- 
damental theorems of the classical theory, ¢.g., the general and the special 
addition theorems (Trk and 1) and the general multiplication theorem (Tın). 
One important result is as follows. In general, if, for a given pair of sentences 
h,e, we choose arbitrarily a real number r between o and 1, then we can find a 
regular c such that ¢(/,e) = 7 (Tsf). This shows that the class of regular c-func- 
tions.contains not only concepts which may be regarded as adequate explicata 
for probability; but also concepts w! ich are entirely inadequate. Therefore, all 
those theories of probability, which contain only theorems valid for all regular 
c-functions are very weak. 

In this and the two subsequent sections, theorems are stated which hold 
for all regular c-functions. Among them are the most fundamental theo- 
rems of inductive logic. Many of these theorems are well known, either 
to be found in modern theories on probability: (eg., the systems of 
Keynes, Jeffreys, Hosiasson, and others; see § 62), or already in the classi- 
cal theory. The present section contains theorems of a very general na- 
ture. The next two sections will deal with the complex of problems tra- 


ditionally associated with the name of Bayes. 
All theorems of §§ 59, 60, and 61; with the sole exception of Ts9-5, 
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hold for all finite and infinite systems (that is to say, they hold both if c 
is any regular c-function ye for &y, and also if ¢ is any regular c-function 
œt for lo corresponding to any sequence of regular c-functions for the 
finite systems), provided the following conditions (A), (B), and (C) are 
fulfilled. 
A. For Qy. The sentences occurring belong to Qy; and no sentence oc- 
curring as evidence (i.e., as second argument of c) is L-false in Qy. 
B. For lo. The sentences occurring belong to Qo; and the arguments 
of c are such that every c-expression occurring has a value (in other 
words, the arguments fulfil the conditions stated in T56-2). 
C. The value of every c-expression occurring as a denominator is posi- 
tive. 
Some of the proofs or references indicating proofs are divided into (I) 
and (II); in these cases, (I) concerns ye, (II) ac. 
T59-1. 
a. o < c(h,e) < 1. (I. From Ts5-2a, T57-1g and i. II. From Ds6-2.) 
+b. If }e D h, (he) = 1. (I. From T55-2a, T20-2l, T57-1a. II. From 
T20-11, T40-21d.) (The converse of (b) holds only in a restricted 
way, see Tsb.) 
c. If his L-true, c(h,e) = 1. (From (b).) 
d. c(t,e) = 1. (From (c).) 
+e. If |e D ~h (in other words, e.k is L-false, e and h are L-exclu- 
sive), then c(h,e) = o. (I. From Ts5-2a, Ds5-2a. II. From T20-11, 
T4o-21d.) 
f. If h is L-false, c(h,e) = o. (From (e).) 
g. c(~te) = o. (From (f).) 
+h. L-equivalent evidences. If |e, =e, (e: and e, are L-equivalent), 
c(h,ex) = c(h,e2). (I. From T57-1a. II. From T4o-2re.) 
+i. L-equivalent hypotheses. If | hı = hz (hı and h, are L-equivalent), 
(ze) = c(hz,e). (I. From T57-1a. II. From T4o-21e.) 
+k. General addition theorem. c(hV i,e) = c(h,e) + c(i,e) — c(h ei). 
(I. From T55-2a, T57-1k. II. From T4o-21a and b.) 
+l. Special addition theorem. Let c(h. i,e) = o. (This condition is ful- 
filled in particular if ¢..i is L-false (e), hence also if «i is L- 
false.) Then c(h V i,e) = c(h,é) + clie). (From (k).) 
+m. Special addition theorem for multiple disjunction. If the sentences hr, 
ha, » . - , hn (tm = 2) are L-exclusive in pairs with respect to e (hence 
always if these sentences are L-exclusive in pairs), then c(h; V i. V 


«+ V kne) = >> clhe). (From (1), by mathematical induction.) 


Pat 


3 
ji 
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. General multiplication theorem. 


(1) c(h. i,e) = clhe) X c(i,e « h). 
(2) = c(i,e) X o(h,e «7). 

Proof. I. Since e.h is not L-false (A), m(e « k) > o (T's58-ra). Therefore, 
Merhed) _ West) y Mehti Hence (x) (with T55-2a). (2) from (1) with (i). 
II. From T4o-21¢. 


. c(~h,e) = 1 — c(h,e). 


Proof. (hye) + (~he) = c(h V ~ he) 0), = 1 (c). Hence the theorem. 


. c(i aje) = clie) + c(j,e) — cli V ye). (From (k).) 
. If}e D iV j (hence, in particular, if ¢ and j are L-disjunct), c(z «7,e) 


= c(i,e) + e(j,e) — 1. From (q), (b).) 


< iV je) = clie) + c(j « ~ie). (From (i), T2r-sp(3), ().) 
. Let the sentences My, ha, . . - , ha (n = 2) be L-exclusive in pairs 


with respect to e (this is the case, in particular, if they are L-exclu- 
sive in pairs) and L-disjunct with respect to e (this is the case, in 


particular, if they are L-disjunct), then Ss c(hy,e) = 1. (From (m), 


(b).) 
c(h,e) = c(h V ~e,e). (From (k), (e).) 


. Let h be the conjunction hr. ha»... In (n = 2). Let c(h,e) = 


(Utne h) = clhe «ha eha) = -© = (hme = tse haa -1 Tis). 
Then c(h,e) = [c(h e)]". 

Proof. By repeated application of (n) (1), c(h,e) = clhe) X cln « Ih) X 
(hye ahi a ha) X v0 x C(ltn,€ a hr a ha a os en 


w. Let h . i be L-false. Then c(h,h V i) + c,h V i) = 1. (From (1), (b).) 
T59-2. t 


a. 


ao 


a mo 


c(i,e) = c(h» i,e) + e(~h « i,e). 
Proof. i is L-equivalent to (h «4) V (~h =i) (T21-5j(2)). Hence the theorem 
with Tzi and 1. 


. (i,e) = c(h,e) X clie « h) + c(~h,e) X c(i,e. ~h). (From (a), Tin.) 
. If þh, D haor te. h: D ha then c(hx,€) < c(h2,e)- 


Proof. I. } e «M D e. ka. Hence, for every regular m, m(e a h) S m(e » ha) 
(T57-1h). Hence the theorem (with T55-2a). II. From T20-11, T40-21f. 


. c(h» i;e) < c(h,e). (From (d) 3J 
. c(h,e) < c(h V i,e). (From (d).) 
. (a V i,e) S clhe) + clie). (From Trk.) 


h.'If pe. h D j, then c(h «j,e) = c(h,e). 
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Proof. I. þeshsj=e.h (T20-2l). Therefore, m(e.h.j) = m(e. h) 
(T57-1a). Hence the theorem (with T55-2a). II. From T20-11, T40-21e. 
i. c(h. e,e) = c(h,e). (From (h).) 
j. If ļe.h:D h, and fe. h D h, (in other words, if pe D (h, = ha), 
h: is L-equivalent to %, with respect to e), then c(%,e) = c(ha,e). 


Proof. I, m(¢ « h) = m(e « ha) (T57-1a). Hence the theorem (with T55-2a). 
II. From T20-11, T4o-21e. 


k. Let m be the regular m-function corresponding to ¢ (for Qy or Ro; 
for Qo, let m have a value for 4). If |} D e and m(e) > o, then 
c(h,e) = ma, 

Proof. Since }h De, |exh=h (T20-21). Therefore, m(e. k) = m(h) 
(T57-1a). Hence (I) from T55-2a; (II) from T56-4a. 

1. Let m be the regular m-function corresponding to ¢ (for Qy or Qe); 
for lo, let m have a value for e . k anda positive value for e. Then 
c(he) = mem Eh. (From T57-1r, (I) T55-2a; (II) T56-4a.) 

m. c(h,e: V e1) = c(h n ene, V e1) + c(h «ener Vex) — clh nez. ea€: V ea). 
(From (i), Tri, Tık.) 

n. If e, «e, is not L-false, c(h,e, Ve.) = c(h,e:) X c(e,¢: Ve) + c(hyes) 
X (ese: V e2) — c(h,er» es) X cler + eae: Vex). (From (m), Tin(2), 
Tar-sp(2).) 

o. If h.e:.e, is L-false, then c(h,e, Vex) = c(hye:) X (ene: Ve.) + 
c(h,ea) X c(e.,e, Ve,). (From (m), Tif.) 

P- Tf e, ¢, is L-false, than c(h,e: Ve.) = c(h,e:) X elene: Ves) + ¢(h,es) 
X (1— c(e,,¢: V e2)). (From (0), Trw.) 

q. Let e, . e, be L-false. Let 7 be the maximum and 7’ the minimum 
of the values c(h,e;) and c(h,e.). (In the case of Lo, it is assumed that 
the values c(e,,¢,Ve,) and c(e,,e,Ve,) exist.) Then r’< c(h,e,Ve,) Sr. 

Proof. 1. Let c(he:) = c(hea). Then c(he:V e1) = c(h.) (p), =r =r. 
2, Let ¢(h,e:) > c(hje2); hence the first is r, the second 7’. Let c(e;, er V es) = q. 
Then c(h, e: Ve.) = rg +r'(1 — g) (p), =7 + g(r — 1’) =r’. Similarly we 


obtain: c(h,e: Ve.) S r (from (p), with e: and e, interchanged). 3. Let c(h,e:) 
< c(h,ea). The proof is analogous to (2). 


The following theorem deals with multiple disjunctions; it makes use 
of the special addition theorem for multiple disjunction (Tım). 
T59-3. Let j be j: Vja V... Vja (n = 2). 
a. Let }e«h D j. (This condition is always ‘fulfilled if } j, in other 
words, if j:,j2,...,j, are L-disjunct.) Let the sentences e. h . dy 
-,@«h«j, be L-exclusive in pairs. (This condition is always ful- 
filled if jz, ja, ...,j, are L-exclusive in pairs.) Then 
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(1) che) = D> cha joe) 5 


Pat 


O = Eline) X elf) 


Proof. (1). e « h is L-equivalent to e « h « j (T20-21). c(h,e) = c(e « h,e) (T2i), 
= c(e a h s j,e) (Tri), = che jpe) (T2i), = c(h a (V -© -V jn), €) = c(h « jr) 
V... V (h a jn), e) (by distribution). Hence (1) by Txm. (2) from (1) by Txn(2). 

b. Let 4 have the same c on each of the sentences ¢+j:,..., €= fn aS 
evidence. Let these sentences ¢«j:,...,@=jn be L-exclusive in 
pairs. Let c(j,e) > o. Then c(h,e«j) = c(h,e jm) for any m (from 1 
to n). 

Proof. hej is L-equivalent (by distribution) to (h«j) V(hej:) V.. 
V (h aja). Therefore (T1m), c(h «j,¢) = D> c(h sjoe) = 2o [ein,e) X clhe sja)] 
(Tın (2)). For any m (from 1 to n), the second factor in each term of the last- 
mentioned sum equals ¢(h,¢ « jm). Therefore, c(h «j,¢) = c(h,e «jm) X Li cline). 


The latter sum equals ¢(j,e) (Tum). Thus, c(h «j,e) = c(h,e « jm) X c(j,e). On 
the other hand (Tın (2), c(h «j,e) = c(j,e) X c(he =j). By forming an equa- 
tion of the two right-hand sides and dropping the factor c(j,e), since it is as- 
sumed to be positive, we obtain the theorem. 

. (Corollary to (b).) Let 4 have the same c on each of the sentences 
jx, +» , Jn as evidence. Let the sentences jx, - » - , jn be L-exclusive 
in pairs. Let co(j) > o. Then c(h,j) = (hjm) for any m (from 1 to n). 
(From (b), with ‘? as e. It follows also from (d).) 

d. Let the sentences j:,... jn be L-exclusive in pairs. Let r be the 
maximum and r’ the minimum of the values ¢(4,j;) ¢ = 1,---, ”). 
(In the case of fo, it is assumed that the values c(jaj) exist.) Then 
r! Sc(h,j) <r. (From T2q, by mathematical induction with re- 
spect to n.) 


It is clear that a chain of inferences is valid in deductive logic in this 
sense: if we can deduce each, except the first, in a series of sentences from 
the preceding one, then the last one is deducible from the first. This holds 
because of the transitivity of the relation of L-implication: if i L-implies 
j and j L-implies k, then i L-implies k (T20-2b); the theorem concerning 
a series of any length follows from this by mathematical induction. Let us 
now examine the problem whether an analogous procedure is valid in in- 
ductive logic. To a superficial inspection it might appear as if inferences 
of the following kind were not ohly frequently used in everyday life but 
also valid: suppose that on the basis of given evidence ¢, the hypothesis 
h, is highly probable and that k: gives high probability to ha; then h is 


o 
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highly probable on e. But in this form the chain of inference is not gen- 
erally valid. Although the relation of a high degree of confirmation is in 
certain respects similar to L-implication, it is not transitive. There are 
cases in which c(X;,e) and c(4.,k:) are both very high and, nevertheless, 
c(ha,e) is very low or even zero. This holds if the ranges of the sentences 
involved, measured by any given m, fulfil the following conditions: 
R(e. h:) is a large part of R(e) but only a small part of 2(4,); that makes 
it possible for 9(#.) to cover a large part of N(k,) without overlapping 
with R(e). [Example. Let m(e« h: » ha) = 0; m(e « hı « ~h2) = 0.000,049,- 
95; m(e» ~h: a h2) = 0; m(e « ~h: « ~ha) = 0.000,000,05; m(~e « h: « ha) 
= 0.0999; M(~e . t: . ~h) = 0.000,050,05. The six sentences men- 
tioned are L-exclusive in pairs (D20-1g); hence their ranges are mutually 
exclusive. Since e. ~h, is L-equivalent to e.h. h: V e. ~h: . ~ha 
(T21-5m(1), T21-5s(1)), m(e. ~h:) is the sum of the m-values of the 
disjunctive components (T57-1m), hence 0.000,000,05. In a similar way 
it is found that m(e. k1) = 0.000,049,95; m(e) = 0.000,05; hence ¢(~/:,e) 
=o,.001 (D55-3), and c(h:,€) = 0.999 (Tip). Further, m(h:. h) = 
0.0999; m(h;« ~ha) = 0.0001; m(k:) = 0.1; hence c(hz,h:) = 0.999. On 
the other hand, m(e.h,) = 0; hence c(h,,e) = 0.) Are then the chains 
of inductive reasoning customarily made in everyday life, in law courts, 
and in science invalid? I think that many of them can be defended as 
valid. They are valid if they do not have the simple form mentioned above 
but rather the following cumulative form: The evidence e available to X 
gives strong confirmation to h,; therefore X believes k, together with e; 
h, is highly probable with respect to e.h, (but not necessarily with 
respect to k, alone); therefore X regards k, as highly confirmed also by 
his actual evidence e. This is a valid procedure. The chain may even be 
longer: X believes now in ¢. h, « ha; if this gives a high probability to h;, 
then he adds h, to his belief, etc. Generally speaking, if the values c(h:,¢), 
C(h2,¢« hx), c(hs,¢» h: « ha), etc., are all very high, then the c of the last 
hypothesis /, on e is also high or at least fairly high, provided the chain 
is not too long. This follows from the subsequent theorem T4b, which says 
that c(%„,e) is at least as high as the product of the afore-mentioned values. 


-+-T69-4. A cumulative confirmation chain 
a. For any n Z 2, c(h: «haa... e hne) = clhe) X c(ha,€ =x) X clhs,e« 
hr = ha) XK... X c(hn€ s Bra ħaa.» a Ayr). 


_ Proof. 1. The assertion holds for n = 2 (Tin(r)). 2. Let kn be the conjunc- 
tion of the  h-sentences, and k»: that of the first n — 1 of them; hence kn is 
haz» ha. Let I, be the n-term product in the theorem, and II,_; the product 


y 


b. 
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of its first n — x terms; hence I, = Ins X ¢(hn,€ » kn). Let us assume that 
the assertion holds for n — 1, that is to say, ¢(kn+,¢) = Ins; we shall show that 
it holds then likewise for #. From Trn(z): c(kn,e) = ¢(Rn—a,€) X c(ltn,€ » kn) = 
In: X ¢(htn,€ s kn) (according to our assumption) = IMa. 3. The assertion 
for every n = 2 follows from (x) and (2) by mathematical induction. 


For any n = 2, c(hme) = c(ltse) X c(hae s h1) X clhse «h: = ha) X 
2. X clhe s his hze aiia ke (From (a), T2e.) 


The following theorem Ts is, like T58-1, on which it is based, re- 
stricted so as not to apply to general sentences in fo. 
759-5. The subsequent assertions (a) to (f) hold under each of the fol- 


lowing two assumptions: 
(i) Let c be a regular c-function for Qy, and let e and / be any sen- 


tences in Qy such that e is not L-false in £y. i i 


(ii) Let c be a regular c-function for le (D56-2) corresponding to any 


fitting c-sequence (D57-5), and let e and 4 be any nongeneral sen- 
tences in £» such that e is not L-false in eo. (Hence, e is not L- 
false in &y (T20-10b), and c(#,e) has a value in le which is the 
same as that in y (T57-6c).) 


The proofs refer to (i); then (ii) follows with T57-6c. 


a. 


If not te D h, then c(h,e) < 1. 


Proof. If not fe D h, then not }e D e «h. However, Fesh D h; therefore 
m(e « h) < m(e) (T58-1g). Hence theorem (with T55-2a). 


. If c(h,e) = 1, then fe Dh. (From (a).) 
. If c(h,e) = 0, then }e D ~h, in other words, e. h is L-false, e and h 


are L-exclusive. (From Trp, (b).) 


. If e.h is not L-false (in other words, not ke D ~k), c(h,e) > o. 


(From (c), Tra.) 


„If c(h,e) = o, then c(e,h) = o. (From (c), Tre.) 
. If c(h,e) > o, then c(e,k) > o. (From (e:).) 
„If c(h,e) = 1, then c(~e, ~h) = 1 and c(e, ~h) = o. (From (b), 


Tıb, Tıp.) 


. Let h be not L-dependent upon e (i.e., neither | e 2 hnorļe D ~h); 


let r be an arbitrary real number such that o < 7 < 1. Then there 
is a regular c (and, moreover, in general, an infinite number of 
them) such that c(h,e) = r. 

Proof. Let jı be ~e, ja be e.h, js be e. ~h. These three sentences are 
L-disjunct (i.e., tj: V ja V js, D20-1e) and L-exclusive in pairs (D20-1g); hence 
every 3 belongs to exactly one of the three ranges. Under the conditions stated, 


jaandj, are not L-false. Let m be any regular m-function. Then m(j;) + mGa) + 
m(j;) = x (T57-1n and d). e is L-equivalent to ja Vis hence m(e) = m(ja) + 
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m(j;) (T57-1a and m). Therefore (Ts55-2a) c(h,e) = m(j.)/m(e) = m(j,)/ 
m(j2) + m(j;). Now we distinguish two cases, I and II. I. Let j; be L-false. 
Then | e, hence m(e) = x. In this case, we choose any regular m such that 
m(ja) = 7. (This is possible according to T58-1l.) Then c(h,e) = r. II. Let 
jx be not L-false. Then we choose any g such that o < g < 1 and any 
regular m such that m(j,) = g, mQ) = (1 — g)r, and m(j;) = (1 — g)(1 — r). 
(An m of this kind can easily be constructed; the sum of the three values just 
stated is 1; all we have to do is to divide each of the three values in an arbitrary 
way, e.g., in equal amounts, among the 3 of the range of the sentence in ques- 
tion.) Then e(h,e) = (1 — gr/(a — gr + G — Q(t — r)) =r. 

Tsb (part (ii), for 2x) is a restricted converse of Tıb, likewise Tsc of 
Tre. The reason for the restrictions is here, as in T58-1, the existence of 
almost L-true and almost L-false sentences. Let om correspond to o¢ 
(i.e., be defined on the basis of the same m-sequence), and let i be almost 
L-true (with respect to om). Then z is general (T58-3d), and ~i is al- 
most L-false. As we found earlier (in connection with Ts58-1), ‘# almost 
L-implies 7 but not }ż D 4. Nevertheless, oc(i,4) = 1 (T56-4a), because 
em(?) = 1 and om(ż.i) = om(i) = 1. Hence, Tsa and b do not hold 
without restriction. Furthermore, since om(~i) = o, oc(~i,t) = o, but 
not }¢ D ~~i. Thus, Tsc and d must be restricted. In general (provided 
the m- and c-values involved exist), the following holds. If c(k,e) = 1, 
then e either L-implies or almost L-implies 4; if c(4,e) = o, then e either 
L-implies or almost L-implies ~k. 

The following example shows the necessity of the restriction not in 
terms of our technical concepts (regular m- and ¢-functions) but instead 
in terms of usual conceptions concerning the explicandum, viz., prob- 
ability,. Let e say that all individuals except two of a given infinite do- 
main have the property M; and let k say that a certain individual b of 
which nothing is known except that it belongs to the given domain has 
the property M. [In our system Qo, eis ‘(Gx) (Ty) [x = y. ~Ma. ~My. 
(2)@ = x.z = yD Mz), and h is ‘Mb’] Then I think that most 
scientists who use a quantitative concept of probability, in cases of this 
kind would ascribe to % on the evidence e the probability, 1 and, conse- 
quently, to ~+ the probability, o. However, h is obviously not a logical 
consequence of e, since the individual b may be one of the two exceptions; 
and hence ~} is not logically incompatible with e. This shows that, with 
respect to an infinite domain of individuals, probability, 1 is not the same 
as certainty or necessity (in the sense of relative necessity with respect 
to evidence e, in other words, logical entailment by e), as has sometimes 
been assumed by earlier authors. Likewise, probability, o is not the same 
as impossibility (in the sense of relative impossibility, i.e., logical incom- 
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patibility). In both cases the inductive relation is wider than the deduc- 
tive one. Most modern authors recognize this difference. 

Tsf corresponds to Ts8-11. It shows that the class of regular c-functions 
does by no means comprehend only those functions which may be re- 
garded as fairly adequate explicata for probability,. On the contrary, 
this class comprehends also functions which, in any given case, deviate 
from a value which may appear as plausible to any extent in either direc- 
tion. As an example, let e be a conjunction of one thousand different 
atomic sentences with the same predicate ‘P’, and % be another atomic 
sentence with ‘P’. Thus, e reports that a thousand things have been ob- 
served and that all of them had the property P; and # predicts that a new 
thing will likewise be P. Now it is one of the characteristic features of 
inductive thinking that, on the evidence of a high relative frequency of a 
certain kind among a sufficient number of observed things, the prob- 
ability, that the next thing will belong to the same kind is high. (This has 
earlier been discussed; see § 47A.) Thus a value of c(4,e) equal or close to 1 
would appear as plausible to most scientists, while a considerably lower 
value, e.g., 1/2, would hardly seem acceptable to anyone. Now, at the 
present moment, we do not assert that c must be close to x in this case. 
We merely call attention to the fact that there are regular c-functions 
which have in this case any value however small (but still positive), e.g., 
one-millionth. 

We shall see later (in § 62) that those modern theories of probability, 
which, in distinction to the classical theory, do not contain something 
similar to the principle of indifference state only such axioms and hence 
such theorems as hold for all regular c-functions. Ts5f shows that these 
systems do not effect a narrow selection of c-functions but admit, in addi- 
tion to adequate concepts, also concepts which are entirely inadequate as 
explicata for probability,. In a certain sense, we might even say that 
these systems hardly make any selection at all, inasmuch as they admit 
as basic m-function any distribution of arbitrary positive values with the 
total amount 1 among the 3. This is no objection against these theories; 
they are certainly correct as far as they go, because they state only those 
properties of ¢ which any adequate explicatum of probability, must cer- 
tainly have. Our result shows merely that these theories are very weak. 
And thus it shows too how far from our aim we still are in the present 
stage of our construction of a quantitative inductive logic; in other words, 
how much remains to be done in order to restrict the very comprehensive 
class of regular c-functions and finally to select one c-function as explica- 
tum. Anticipating later discussions, it may be remarked that we shall 
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teach the aim not by many small steps but by two big steps. The first step 
will consist in selecting a special kind of regular c-functions to be called the 
symmetrical c-functions (chap. viii). The second step will lead, by way of 
one additional requirement, to the function c* which will be proposed as 
explicatum (§ 110A). 

The following theorems T6 deal with c-values o and 1. They are listed 
here for reference purposes only. They are chiefly of interest with respect 
to lœ, especially for general sentences. For any sentences in finite systems 
and for nongeneral sentences in 2, the c-values o and 1 coincide with cer- 
tain L-concepts, as we have seen (Tsa, b, c, d); hence in this case the fol- 
lowing theorems follow directly from simple theorems in deductive logic, 
e.g., [6a from the theorem ‘If e. h is L-false, ¢.h «i is L-false’. In Lo, 
however, c(/,e) may be o while e . # is not L-false but only almost L-false; 
thus in this case the theorems are of interest. [Most of the theorems in T6 
have been stated by Keynes ([Probab.], pp. 140-46); his proofs, however, 
are of little value because they hold only for finite systems, since he iden- 
tifies probabilities o and 1 with the corresponding deductive concepts (see 
below, § 62).] 

T59-6. 

a. If c(h,e) = o, c(h ie) = o. (From Tze.) 

b. If c(h,e) = o and c(i,e) > o, then c(h,e.i) = o. 


Proof. ch «i,e) = o (from (a), = c(i,e) X c(he ei) (from Tim (2)). Since 
the first factor is not o, the second must be o. 


O 


. If c(h,e) = 1 and c(i,e) > o, then c(h,e. i) = 1. 


Proof. c(~h,e) = o (Tıp). Hence c(~h,e ai) = o (from (b)). Hence the, 


assertion (Tıp). 


d. If c(k,e) = x, then c(h. i,e) = c(ie « k) = c(ie). 


Proof. (x) Let c(i,e) > o. c(h «i,e) = cli,e) X c(hye wi) = c(h,e) X c(i,e « h) 
(Tin). Since c(h,e) = 1, c(hye «i) = 1 (c); hence the assertion. (2) Let c(i,e) = o. 
Then assertion from (a) and (b). 


e. If c(h. i,e) = x, then c(h,e) = x and c(i,e) = x. (From Tze.) 
< I c(h: D hae) = 1, c(hz,e) S c(h,,e). 


Proof. 1 = c(h D hae) = (~h V hae) S e(n) + c(hae) (from T28), 
a — c(h,e) + c(hae) (Trp). Hence o S —ce(l,e) + c(hae). Hence the as- 
on. 


g- If c(h: = hae) = 1, c(hz,e) = c(ha,e). 


Proof. h= ha is L-equivalent to (m D ha) . (ha D hı). Therefore 
c(h: D hae) = 1 and c(ha D he) = 1 (e). Hence assertion by (f). 


eh 


pe 


= 
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. If c(h,e) = 1, then (1) c(¢ V k,e) = 1. (From T2f.) 


(2) c(i D h,e) = x. (From (1).) 


. If (i= je) = 1, clie) >0, and c(j,e) > o, then c(h,e.t) = 


c(h,e « j). 

Proof. Tın(2) yields these two equations: (1) c(h « t,e) = c(ż,e) X c(h,e .i), 
(2) c(hej,e) = ce) X clhe aj). c(h D G=J),e) =1 (from (h2)). Hence 
c(h wi = hej,e) = 1 (T21-50). Hence the left-hand sides in (1) and (2) are 
equal (g). Therefore the right-hand sides are equal. Since the first factors are 
equal (g) and positive, the second factors are equal. 


. If c(i = j,e) = 1 and c(k,e) > o, then c(ż,e . k) = c(j,é» k). 


Proof. Tın yields these two equations: (1) c(k,e) X clie s k) = c(i,e) X 
c(kye wi), (2) cke) X ce a k) = ce) X elke j). On the right-hand sides 
the first factors are equal (g), and likewise the second factors (i). Therefore 
the right-hand sides are equal, and hence the left-hand sides. Hence the asser- 
tion, since ¢(,e) is positive. 


. If c(h,e) = o and c(i,e) > o, then c(h. e,7) = o. 


Proof. c(h,e „i) = o (b). From Tin(2): c(h a e,i) = c(e,i) X clhe =i) = 0. 


. If c(h wie) = o and-c(i,e) > o, then c(h,e i) = o. (From Txn(2).) 
. If c(h,e) = 1 and c(i,e) > o, then c(e D ki) = 1. 


Proof. c(~h,e) = o (Tıp). Hence (~h « e,i) = o (k). Hence (~(~h « @),t) 
= 1 (Trp). Hence the assertion (T21-5g(1)). 


. If c(é D h,e) = 1 and c(ż,e) > o, then c(h,e +7) = 1. 


Proof. c(i. ~h,e) = 0 (T2r-sg(2), Tıp). Therefore ¢(~h,e«i) = o (). 
Hence the assertion by Tip. 


. Ife(i D (h: = Aa),€) = rand c(i,e) > o, then c(/,¢«4) = c(hae » i). 


(From (n), (g)-) 


. Let c(h,i) = 1 and c(k,j) = o. Then 


(x) either c(ż,j) or ¢(j,2) is o; 
(2) if c(e,i) > o and c(e,j) > ©, c(i. je) = 0. 

Proof. (1). From Tin: ¢(i,j) X c(h,j ai) = c(h, j) x c(i,j = h). Since (h,j) =0, 
the left-hand product is o and hence at least one of its factors is o. Again from 
Tin: (j,i) X (~hi ej) = o(~h,i) X Gah ai). (~hi) = 0 (Trp). There- 
fore the left-hand product is o and hence at least one of its factors iso. Thus 
(1) holds, because otherwise (hi ej) = o and simultaneously c(~h,i «j) = 0, 
hence c(i =j) = 1 (Trp), which is impossible. (2). From (x) by (k). 


. If c(he) = o, c(e,é) = 1, and c(ie) > o, then c(#,t) = o. 


Proof. c(h. e,i) = o (k). Therefore c(h,i) X c(e,i « h) = o (Tın(1)). Hence 
c(h,i) = o, because otherwise c(e,i a k) would be x (c), which leads again to 


c(h,t) = o. S 
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If c(h,e) = 0, c(h,~e) = 0, clie) >o, and c(i,~e) >o, then 
c(h,i) = o. 

Proof. c(h »¢,i) = o (k), and c(h» ~e,i) = o (k). c(h,i) is the sum of these 
two c-values (T2a) and hence is o. 
If c(h V i,e) = o, then c(h,e) = o. (From Taf.) 
If c(h,e) = o and c(i,e) = o, then c(h V i,e) = o. (From (a), Tık.) 


. If c(h,e) = o, then c(k D i,e) = x. (From (h(1)), with ~h for h, 


and Tip.) 
If c(h,e, Ve.) = o and cene: V e2) > 0, then c(h,e,) = o. 

Proof. c(h « (ex V e:),¢:) = o (k). Hence c(h «e:,¢:) = 0 (s). Hence the as- 
sertion by T2i. 
If c(h,e: V e1) = 1 and c(ene: Ve.) > 0, then c(h,e;) = 1. (From (v) 
with ~h for k, and Tıp.) 


. If c(h,e:) = o and c(h,e.) = o, then c(h,e, Ve.) = o. 


Proof. c(h «ese: \ e1) = 0 (k), since c(e: V eae:) = 1 (Trb), > o. Analo- 
gously, c(h. eae: Vez) =0. Therefore c(h. (e:Vez),¢: V e) =0 (from (t), 
T21-5m(1)). Hence assertion by T2i. 


Tf c(k,e:) = 1 and c(h,e.) = 1, then c(h,e: V e2) = 1. (From (x) with 
~h for h, and Tıp.) 


Confirmation of Hypotheses by Observations: Bayes’s Theorem 


This section deals with the following situation. e formulates our present 
knowledge, say, a report on results of earlier observations. k is a hypothesis. 
i is a prediction of a future observation, which, if we hypothetically assume %, 
has a certain probability: (i.e., c(i,e«h)), which we call the likelihood of i. 
c(h,e) is called the prior confirmation of k, c(h,e „ i) its posterior confirmation. 
The question is raised: how much is the confirmation of 4 increased when the 
observation predicted by 7 actually occurs? The answer is given by the general 
division theorem (Tıc and d): the ratio of increase of the confirmation of h (i.e., 
the posterior confirmation divided by the prior confirmation) is equal to the 
likelihood of ¢ divided by c(i,e). Bayes’s theorem (T6) applies this result to the 
case of n competing hypotheses of which it is known that one and only one of 
them holds. Bayes’s theorem has often been criticized, and it must be admitted 
that some formulations of it and many applications of it (using the principle of 
indifference) are objectionable. However, there can hardly be any doubts as to 
its validity (if formulated correctly), since it is founded on assumptions which 
seem accepted by all quantitative theories of probability:. The question whether 
and how the theorem can be applied for the actual computation of a posterior 
confirmation will be dealt with in a later chapter. 


The theorems of this section are formulated in a general way, with re- 
spect to any sentences. However, they are especially important for practical 
application in situations of the following kind. Let e be a formulation of 
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the evidence available to the observer X at the present moment, say, a re- 
port about the observations made by X. k is a hypothesis concerning things 
not known to X; in other words, neither # nor ~k follows from e. More- 
over, k is not simply a prediction of some future directly observable 
events. Thus X does not expect to acquire complete knowledge about / in 
the future; all he hopes for is to find some evidence which might give in- 
direct and partial confirmation for 4. And, in particular, there is a sen- 
tence i formulating a future observable event which is connected with h 
in such a manner that either it follows from e . k or at least seems prob- 
able to a certain degree if k is assumed together with e. We call c(k,e), 
i.e., the confirmation of + before the new observation formulated by 7 is 
made, the prior confirmation of h; and its confirmation after the observa- 
tion i, i.e., c(k,e . i), the posterior confirmation of h. [Note that ‘prior 
confirmation’ does not mean a priori confirmation or null confirmation, 
i.e., c(A,t); in general, e is factual and may contain information relevant 
for k, e.g., the results of earlier observations similar to i; ‘prior’ means 
merely ‘prior to the new observation in question’.] We are chiefly in- 
terested in determining the posterior confirmation ‘of / and its relation 
to the prior confirmation, in particular, whether the confirmation of h is 
increased when the observation i is made. If it is increased, we shall say 
later that é is positively relevant to h on the evidence e (D6 5-1a); the prob- 
lems of positive and negative relevance and irrelevance will be studied in 
detail in the next chapter. As mentioned above, we suppose that 7 has a 
certain connection with # such that if X assumes 4 hypothetically to- 
gether with e, then he is in a position to predict 7 with a certain probability 
or even with certainty. We call this probability, i.e., c(i,e « h), the likeli- 
hood of the observation i (with respect to the hypothesis % and the 
evidence e). c(é,e), the probability of 7 on e alone, without regard to the 
hypothesis 4, will be called the expectedness of the observation 7. In 
this section the problem is discussed in a general way for any value of the 
likelihood. The next section will deal with the special case that the likeli- 
hood is 1, that is, where the observation 7 can be predicted with certainty 
(or almost certainty) with the help of k. 


The terms ‘prior confirmation’ and ‘posterior confirmation’ are adaptations 
of Jeffreys’ terms ‘prior probability’ and ‘posterior probability’. The term ‘like- 
lihood’ is likewise taken from Jeffreys, who uses it in the sense explained 
above; it was earlier introduced by R. A. Fisher in a related but somewhat dif- 
ferent sense (in [Foundations]). The term ‘expectedness’ was suggested to me 
by Herbert Bohnert. 


Examples. 1. Let the evidence e of the observer X include the state- 
ment that the weather today is of the kind M. 7 is a forecast of the 
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weather situation M’ for tomorrow. / is a meteorological law saying that 
a weather situation M is in 70 per cent of the cases followed by one of 
the kind M’ on the next day. Suppose that, for a given c-function, the 
likelihood of 7 is 0.7. (This value would result, e.g., for all symmetrical 
c-functions, as we shall find later; see To4-1e.) The problem is: How will 
the confirmation of / increase if X observes tomorrow that the expected 
weather M’ actually occurs? If % is not a merely statistical law but a de- 
terministic law saying that M is always followed by M’, then we have the 
special case of the likelihood 1. 2. is the assumption, so far not sufficient- 
ly tested by X, that a person Y has a certain disease D. e contains reports 
about earlier occurrences of this disease and its correlation with certain 
symptoms and, in addition, a report about a few, inconclusive symptoms 
observed by X in Y’s case. X intends to make a blood test; 7 formulates 
the positive result of this test, i.e., the one which would be expected as 
probable if Y were known to have the disease D. The probability of this 
expected result is what we call the likelihood of 7. 

The theorems of this section hold both for finite and for infinite systems 
under the conditions (A), (B), and (C) stated at the beginning of § 59. 
In these theorems, c-expressions occur frequently as denominators or as 
factors in a denominator; it is to be noted that, according to (C), the 
theorems in question presuppose that those c-expressions have positive 
values. 

Our question as to the posterior confirmation and its relation to the 
prior confirmation is answered by the general division theorem T1c, which 
was already known in the classical theory of probability; it leads to 
Bayes’s Theorem to be stated later (T6). We see from this theorem Tıc 
that the posterior confirmation of the hypothesis + is (i) proportional to 
the prior confirmation of /, (ii) proportional to the likelihood of the ob- 
servation 7, (iii) inversely proportional to the expectedness of the ob- 
servation 7; this means that, the more surprising the new observation 7 is, 
in other words, the less X could expect it on the basis of his prior evidence 
e, the more does its occurrence increase the confirmation of the hy- 
pothesis %. In the example (2) above, this means that, if the result i of 
the blood test is known to X to occur very seldom in the population in 
general and hence its occurrence in the case of Y has a low expectedness, 
then its actual occurrence in this case strengthens the c of % considerably. 

From Tic, Trd follows immediately. It concerns the quotient of the 
posterior and the prior confirmation; in other words, the ratio of increase 
of the confirmation of # because of the addition of the new observation t 
to X’s knowledge. This will later be called the relevance quotient of i for 
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h on e (§ 66). T1d says that this relevance quotient is equal to that of k 
for i on e, i.e., the ratio in which the confirmation of 7 would be increased 
by the addition of k% to the knowledge e. 


T60-1. 

a. clie. k) = pia, (From T5ọ-1n (1).) 

b. c(h,e) X clie s h) = c(i,e) X c(h,e . i). (From Ts59-1n.) 
+c. General Division Theorem 

c(h,e . i) = 2x tich (From (b).) 
+d. Mees? = MsP. (From (c).) 

e. c(i,e. h) = ankio. (From (a), T59-2a.) 

The following theorem T2 deals with the comparison of two hypotheses 
h, and ka, both of which make the expected observation 7 probable. Since 
the expectedness c(i,e) is the same, independently of the hypotheses, the 
general division theorem (Tıc) leads here to the result (T2b) that, in 
order to find a comparison of the posterior confirmations of the two hy- 
potheses, we need only compare the first two of the three factors men- 
tioned earlier, viz., (i) the prior confirmation of the hypotheses, and (ii) 
the likelihood of 7 for each of the hypotheses. Consequently, if the hy- 
potheses had approximately the same prior confirmation, but the observa- 
tion i is considerably more probable on the assumption of /, than on that 
of ha, then the posterior confirmation of /, will be considerably higher than 
that of 4. [In the example (2) above, suppose that, before the blood test, 
two diseases D, and D, come into consideration with about equal prior 
confirmation; and that the likelihood of the result ż, if it is assumed that 
the patient has D,, is five times as strong as if D; is assumed; then, after 7 
is observed, the confirmation of D, is about five times as high as that 
of D,.] 

T60-2. 

a. (Lemma.) c(hne) X clie s hx) X c(hae « i) = c(hty,e +4) X c(h) X 

c(i,e . kz). (From Tib.) 

b. isng = tax ties: (From (a).) 

In the following theorem T3, the effects of two observations i, and 7, on 
the confirmation of the same hypothesis # are compared. Since ¢(h,e) 
is the same in both cases, only the influence of the last two of the three 
factors earlier mentioned is different: (ii) the likelihood of 2, or 7,, respec- 
tively, and (iii) the expectedness of , or t2, respectively. (T3a is essential- 
ly the same as Tza, only with different letters.) [In the example of the 
medical diagnosis, this theorem is applicable if X. has to choose between . 
two different tests T, and T4; in the first test the result ¢, would confirm 
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the assumption that the patient has the disease D; in the second test, the 
result 7, would confirm the same assumption. Before X makes the tests, 
he wants to know which of the two results would lead to a higher posterior 
confirmation of his assumption; perhaps he prefers that test whose posi- 
tive result would give him more certainty. T3b gives the answer. This 
theorem would lead X to the following decisions. (1) If the two test re- 
sults have about equal expectedness (i.e., probability before the tests), 
then X chooses the test whose positive result has higher likelihood, i.e., 
is known to occur more frequently in cases of the disease D. (2) If the 
two test results have about equal likelihood, then X chooses the test 
whose positive result has a lower expectedness, i.e., is known to him to 
occur less frequently in the population in general. Incidentally, in our dis- 
cussion of these exampies, we have assumed that, the higher the observed 
relative frequency of a property, the higher is the confirmation of a future 
instance of this property. This is a customary feature of inductive think- 
ing (cf. § 48B). In our inductive logic this result will appear much later, 
in the theory of the predictive inference (§ 110C).] 


T60-3. 
a. (Lemma.) c(i:,e) X c(h,e « i1) X c(ize a h) = elise») X cize) X 
c(h,e « i2). (From T2a.) 

b. tihesiy = trsh X tea: (From (a).) 

The following theorems T5 and T6 apply the general division theorem 
(Tıc) to the case of several competing hypotheses. Ts deals with the case of 
two competitive hypotheses, for instance, two contradictory hypotheses / 
and ~h; T6 concerns the general case of n hypotheses of which it is known 
(either logically, or by the evidence e, or at least after the observation i, 
i.e., by e.t) that one and only one of them holds; but it is not known 
which one holds. T6, together with other related theorems, sometimes 
also Ts, bears traditionally the name of Bayes’s Theorem. The theorem 
which Thomas Bayes [Essay] actually stated and proved may be in- 
terpreted as a special case of T6b concerning the inverse inductive in- 
ference; this will be discussed later (in Vol. II). 


T60-5. Let c(i,e) = o. Let c(h,e « i) + c(h.,e.i) = 1. (This condition 
is fulfilled if %, is ~h,.) ° 
a. O(h;,€ «4) = axte DIC) 


Proof. We replace in T2a ‘c(hz,¢ »i)’ by ‘1 — c(h,e « i)’ and multiply out. 
b. Let c(Az,e) = c(hz,e). 


Then c(h,» i) = aior. (From (a).) 
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-+T60-6. Bayes’s Theorem. Let c(i,e) > o. Let hı, ha, - . . , hn (n = 2) be 
such that (1) pe. D h: V hi V... V hn, and (2) the sentences e « å « kr, 
e.i. ha... , €. i. hn are L-exclusive in pairs (D20-1g). Let k be any 
` one of the 7 h-sentences. 
a. (1) c(h,e .î) = Scns . (From Tre, T59-3a(z).) 
C(i = h,e) 


(2) = — the xti (From (1), T59-1n(2).) 
Do [c(hs,e) X Clie = he)] 


b. Let c(/tp,e) have the same value for every p (from x to »). Then 
c(he. i) = — = , (From (a).) 


Ji Clie whe) 


T6 refers to n hypotheses fy, ... , An The evidence e «7 shows (1) that 
at least one of them holds (they may, e.g., be L-disjunct), and (2) that 
at most one of them holds (they may, e.g., be L-exclusive in pairs, 
D20-1g). For hy (p = 1 to n), let c(hp,e) X cle « hp), i.e., the product of 
the prior confirmation of kp and the likelihood of i, haye the value rp. 
Then we know from the general division theorem (Trc) that the posterior 
confirmation of hp is proportional to rp; now, Bayes’s theorem T6a says 
that the posterior confirmation is 7»/2rp. 

T6b concerns the special case where the prior confirmations for all ” 
hypotheses are equal. Thus here, of the three factors influencing the 
posterior confirmation of a hypothesis, only the likelihood is different for 
the various hypotheses; and T6b says that the posterior confirmation of 
hy is simply the likelihood of 7 with respect to hy divided by the sum of 
the likelihoods of i with respect to all n hypotheses. 

Objections have frequently been raised against Bayes’s theorem and 
many applications of it. It must be admitted, I think, that the customary 
formulations in the classical period, beginning with Bayes’s own formu- 
lation, contain an obscure point (e.g., phrases like ‘the chance that the 
probability of the event lies in a certain interval’). It seems to me that 
this obscurity is chiefly due to a confusion of probability. with prob- 
ability,. Further, the theorem has sometimes (not by Bayes himself) been 
applied to cases where it led to strange or even absurd results. This was 
mostly due to an uncritical use of the principle of indifference. These 
mistakes give no reasons for objections against a formulation like our 
T6a. This theorem is provable on the basis of our definition of regular 
c-functions, and hence likewise on the basis of those weak assumptions 
which practically all theories of probability: seem to have in common. 
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Therefore for anybody who accepts at all any of these theories, there can 
be no doubt concerning the validity of Bayes’s theorem in the form Téa. 
The question as to its usefulness is not so easy to answer. First, this theo- 
rem—like all theorems in this chapter except those which state a c-value o 
or 1—does not enable us actually to compute the c for any given pair of 
sentences but says merely how some c-values are connected with other 
c-values. In order to apply any of the theorems, we must already know 
some c-values. How do we'find the first ones? In the classical theory this 
was done with the help of the principle of indifference. However, since 
this principle leads to contradictions, we have to give it up. Those modern 
axiom systems for probability, which dispense with this principle give 
no means to compute any c-value (except o or 1). Our later construction 
of inductive logic will have the task to furnish other, consistent rules by 
which to compute c-values. At the present stage of our discussion, the 
situation with respect to Bayes’s theorem is not worse than for the other 
theorems; it is just as valid as the other ones, and the other ones are just 
as inapplicable as this one. 

It has often been said that Bayes’s theorem is essentially different from 
the other theorems (those given by us in § 59), that it is of little use or 
even of no use because its application by an observer X requires that he 
knows, for each one of the hypotheses, not only the likelihood of 7 but 
also the prior confirmation of 4; to find the latter is regarded as more diff- 
cult or even as impossible in many or in all cases. If we take the theorem 
in the general form given above, there is no essential difference between 
the two c-values. However, with respect to the problem of the inverse in- 
ductive inference, for which the theorem has mostly been used since 

` Bayes, there is a certain core of truth in the view mentioned. Let e say 
that a certain population (e.g., the inhabitants of a certain town or the 
balls in a certain urn) consists of 1,000 individuals; let i say that in a cer- 
tain sample of ro individuals of this population 8 have the property M 
(black-haired persons, black balls); let kp (p = 8 to 998) say that the 
whole population contains individuals with the property M. The task of 
finding c(ċ,e . h) would be a case of the direct inductive inference; the 
determination of c(h,e « 7) would be an inverse inductive inference (§ 44B). 

Bayes’s theorem is usually applied for the second purpose. We see from 
T6a that the solution of this second task requires (1) the knowledge of 
c(i,e.k) and hence the solution of the first task, and (2) the knowledge 
of c(h,e), that is, the prior confirmation for all possible frequencies in the 
population before we observe the sample. Both tasks require new rules 
in addition to those available in this chapter and in the consistent theories 
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of probability, known today. We shall see later that the solution of the 
first problem requires only a rather weak and very plausible additional 
rule, which has been used implicitly by many authors, to the effect that c 
is symmetrical (chap. viii). For the second problem, on the other hand, 
a much stronger additional rule is needed. The latter rule is so strong that 
it makes our system of quantitative inductive logic complete; it leads to 
the definition of e* (§ 110A). 


§ 61. Confirmation of a Hypothesis by a Predictable Observation 


This section deals with a special case of the situation discussed in the pre- 
ceding section, the case where the likelihood c(i,e « h) is x, in other words, ¢ +h 
L-implies (or almost L-implies) 7. We express this condition also by saying that 
the observation i is predictable under the hypothetical assumption of h, or that 
h is a suitable hypothesis for the explanation of i. The most important of the ` 
theorems holding for this case is the special division theorem (T3b and c) which 
says that the ratio of increase of the ¢ of # in consequence of the observation 
i is the reciprocal of the expectedness of 7 (i.e., 1/c(i,e)). Thus, for different hy- 
potheses explaining i, the ratio of the increase of c is the same (T6f). 


This section deals with an important special case of the situation dis- 
cussed in the preceding section, viz., the case that c(i,e « A), the likelihood 
of the observation #, is 1. 

In order to clarify the meaning of this condition, let us use two custom- 
ary phrases of the word language (not in our technical terminology, but 
only in the informal discussion in this section and in similar discussions 
later). A certain relation between a hypothesis # and an observation 7 (on 
the basis of given evidence e) may be expressed by saying: “i is predictable 
under the hypothetical assumption of / (together with e)’ or ‘h is a hy- 
pothesis capable of explaining i (on the evidence e)’. The first phrase 
seems more natural at a time before the observation iis made, the second 
seems more natural afterward; however, the logical relation between the 
three sentences which the phrases describe is of course independent of the 
time point from which we look at the situation; therefore we may take 
the two phrases as synonymous. Strictly speaking, however, each of the 
phrases may be understood either in a strong sense or in a slightly weaker 
sense. They may either be taken as describing the deductive relation that 
e . h L-implies i, or as describing the inductive relation that c(i,e'. h)= "2. 
For Qy, the two relations coincide (T59-1b, T59-5b). For lo, however, 


the latter relation is slightly weaker, ‘because it holds not only if e.k 
h. almost L-implies 7. Thus we have to dis- 


L-implies i but also if e. : l 
tinguish between predictability (or explanation) in the strong deductive 
is meant in the 


sense and in the weaker inductive sense. The latter sense 
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condition assumed in this section. If i is predictable in the stronger de- 
ductive sense, the theorems of this section hold likewise, 

For the examples (1) and (2), this special case has already been ex- 
plained in § 60. If the meteorological law h in example ( 1) is determinis- 
tic, the likelihood of i becomes 1. In example (2), the likelihood of ż is 1, 
if the assumption % of the disease D makes the result 7 of the blood test 
certain or almost certain. 

Many theorems in this section (but in a weaker version, with the 
stronger deductive condition that | e . 4 D 7) have been stated by Janina 
Hosiasson (see below, § 62). 


T61-1. Let c(i,e. k) = 1. Then the following holds. 

a. c(h. i,e) = c(h,e). (From T59-1n.) 

b. c(i,e) = c(h,e) + c(i. ~h,e). (From T59-2a.) 

c. c(i,e) = c(h,e) + c(~h,e) X clie. ~h). (From (b), Ts9-1n.) 

d. Let the sentences h, hy, h2,..., An fulfil the following two conditions: 
(x) Fesi D AV hV hV... V hn (this condition is always fulfilled 
if the 4-sentences are L-disjunct); (2) the sentences e « i « h, e «i» hn 
@si«hz,...,€«t«h, are L-exclusive in pairs (this condition is 
always fulfilled if the -sentences are L-exclusive in pairs). Then 


c(i,e) = c(h,e) >> [c(h e) X c(i,e » hp)]. 


Proof. From (1): pesi. ~h D h: V ha V...V hn (T21-sh(7)). From (2): 
for every p (from 1 to n), € et « hpa h is L-false, hence | hp» e.i D ~h, hence 
bhp D (esi D ~h) (T21-5k(1)); therefore, FAV... V ha D (esi D ~h) 
(Tar-sn(4)), hence |e si (h V... V ha) D ~h (T21-5k(1)). Therefore, ¢ «i « 
~h is L-equivalent to e «i» (#: V... V An) and hence, by distribution, to 
calehi) V...V Guha). Therefore c(i « hye) = el e h) V.. Ven hinds) 


(Ts9-2)), = X eli a hne) (Tso-1m), = > [elltye) X clie « hy)] (Tso-m (2).) 

Hence theorem with (b). T 

e. e. h either L-implies or almost L-implies 7. 

é Proof. If in T59-6m ‘i’ is taken for ‘h’, ‘e a h’ for ‘e’, and ‘P for ‘i’, the con- 
ditions in the theorem are fulfilled. Hence c(e «h D i,t) = 1, = m(e «h Di) 
(T57-3). Therefore e, h D i is either L-true or almost L-true (D58-1a). Hence 
the assertion (Ds58-1c). 

T61-2, Let c(h2,¢»h;) = c(hs,¢ = hz) = 1. 
a. e. h, and e. h, are either L-equivalent or almost L-equivalent. 
Proof. m has the value 1 for the following sentences: e « ix D ha (see proof 


of Tre), esh Deshs, es hz Deh; (analogously). Hence the assertion 
(58-30). 


b. c(k,,e) = c(h.,e). (From Tra, T59-1i.) 
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In the following theorem T3, especially the items (b) and (c) are im- 
portant. Since we assume now that the likelihood of the observation 7 is 1, 
only the first and third of the three factors influencing the posterior con- 
firmation of i remain. Thus, the general division theorem (T60-1c) leads 
here to the special division theorem T3b, and further to T3c. The latter 
says that the ratio of the increase of the c of by the observation 7 is sim- 
ply the reciprocal of the expectedness of this observation. This means the 
following. Before X makes the observation i, he can compute its expected- 
ness, i.e., its ¢ with respect to the available evidence, without regard to 
any hypothesis. As an example, let us assume that the expectedness of 7 
is 1/10. Suppose now that X actually makes the observation 7 so that his 
evidence increases from e to e « 7. Then T3c says this: if his any hypothesis 
which would explain the observation i—in the sense that e. h L-implies 
or almost L-implies i—then the c of # grows by the observation 7 to ten 
times its prior value. There are of course many different hypotheses each 
of which is capable of explaining i in the sense indicated; some of them 
are strong, others weak; for some the prior ¢ is very low, for others not 
quite so low (it cannot be higher than 0.1); irrespective of these differ- 
ences, the c of each of these hypotheses grows, when X makes the new 
observation i, to ten times its prior value (cf. T6f below). 

T61-3. Let c(i,e.h) = 1. Then the following holds. 

a. c(h,e) = clie) X clhe +4). (From T60-1b.) 
+b. Special Division Theorem. 


c(h,e i) = $3. (From (a).) 
C(hye wi) 


+e, Her) — oF. (From (b).) 
d. Ca T7.) (he.i) — clhe) = (he) (gy — 1). (From (b).) 
e. c(h,e. i) = c(h,e). (From (a), Ts9-1a.) 

f. If c(i,e) = 1, then c(h,e+ i) = c(h,e). (From (a).) iy 
g. If c(h,e» i) = c(h,e) > o, then c(i,e) = 1. (From (a).) (This is a re- 


stricted converse of (f).) 

h. If c(h,e«) > c(h,e), then c(i,e) < T (From (a).) 

i. Let c(k,e. i) > o. If c(ie) < 1, then c(h,e =i) > c(h,e)- (From (a).) 
(This is a restricted converse of (h).) 

k. If c(h,e) = 1, then likewise c(h,e» 4) = 1. (From (e), Tso-1a.) 

. (For Se, let c(i,e) > 0.) TE c(k,e) = o, then likewise c(%,e . î) = 0. 
(From (a).) [For 8v, the restricting condition need not be stated, 
because, according to assumption (A), e. tis not L-false, and hence 

the condition is fulfilled (Ts9-34)-] eee 
T3e says that the confirmation of 4 cannot decrease y theo servation t 
but either remains the same or increases. Tf to i deal with the latter two 


= 
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cases separately. They say roughly this: the confirmation of / remains the 
same if and only if the expectedness of the observation 7 is 1, i.e., if this 
observation was certain or almost certain beforehand; the confirmation 
of h increases if and only if the expectedness of i is <1, i.e., if i was not 
certain beforehand but represents a new fact. T3k and | say that, if the 
prior confirmation of % has one of the extreme values 1 or o, then it re- 
mains unchanged by 7. 

The following theorem deals with the comparison of two observations 
i, and i+, which are predictable by the same hypothesis h. 


T61-5. Let c(z,,e.h) = c(i.,e.k) = 1. (For Qy, this means that e.h 
L-implies both i, and i,.) 

a. (Lemma.) c(é,,e) X c(,e»i:) = c(i.,€) X c(h,e « i2). (From T60-3a.) 

b. (per = “(From (a).) 

c. Let c(é,,e) > o. If c(h,e+i:) < c(h,e~i;), then c(i,,e) > cline). 


Proof. If the two conditions are fulfilled, the two denominators in (b) are 
positive. Hence theorem from (b). 


d. Let c(h,e. i) > 0. If c(é,,e) > c(é.,e), then c(h,e+i,) < clhe. i). 
(From (b), like (c).) 

e. If c(h,e«7:) = c(h, . i2) > o, then c(i,,e) = chine). (From (a).) 

f. If cie) = chine) > o, then c(h,e.i,) = c(h,e . i2). (From (a).) 


The following theorems T6 and T7 concern two hypotheses h, and ha, 
each of which explains the observation in the sense of giving to it, to- 
gether with the evidence e, the confirmation 1. 


T61-6. Let c(ż,e « h:) = c(ż,e « ha) = 1. (For Qy, this means that both 
e. h; and e«h, L-imply i.) 

a, (Lemma.) c(/,,€) X c(hne si) = c(h,,e +i) X c(h.,e). (From T6o-2a.) 

b. cases), = skg - (From (a).) 

c. Let c(h,,e) > o. (Hence, likewise c(it.,¢.i) > 0, T3e.) c(hye +i) < 

c(ha,e «7) if and only if c(/,,e) < c(h,,e). (From (b), like Tsc.) 

d. If c(kae « 7) = c(h.,e.4) > 0, then c(h,,e) = c(hz,¢). (From (a).) 

e. I Cline) = lhe) > o, then c(h,,¢.2) = c(hae . i). (From (a).) 

f. a = te - (From (a).) 


T7 differs from T6 by the additional assumption that the expectedness 
of 7 is neither o nor 1. For Qy, this means that i is not L-dependent upon 
e, i.e., that e L-implies neither i nor ~i. In the situation earlier discussed, 
it was assumed that i describes a possible new observation; in this case, 
the condition mentioned is obviously fulfilled. 
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761-7. Let c(i,e«h:) = c(i,e « ha) = 1, and o < c(i,e) < 1. We write 
‘D; as short for ‘c(hz,¢ « i) — c(hz,e)’ and ‘D? for ‘c(hae = i) — c(h2,e)’. 
a. c(/x,€), C(hy,€» i), and D; are either all three positive or all three o. 
Likewise with c(ha,e), c(A2,¢«2), and D3. 
b. If c(kae) 2 c(hĘ,e), then D, 2 Dy. 
If c(hae) > c(Ax,e), then Da > Dy. 
d. If c(h2,e) = c(hy,e), then D: = Dy. 3 
Proof for (a), (b), (c), (d). Since e(#,e) > o, T3d can be applied. Accordingly, 
D, = c(ħ,e) (Een — 1); Dz is analogous. Since c(i,e) < 1, Ga — 1 is positive; 
it has the same value for A; and for ha; Hence (a) (with T3a), (b), (c), (d). 


° 


Let us now clarify the content of the theorems T6 and T7. k, and 4, are 
two hypotheses each of which, together with the prior evidence e, explains 
the new observation i, Let us assume that the prior and posterior confir- 
mations of both hypotheses are positive. Then the ratio of the posterior 
confirmations of the two hypotheses is the same as that of their prior con- 
firmations (T6b); hence the ¢ of #2 is higher than that of h, after the ob- 
servation if and only if the same holds before (Tóc); and the two ¢ are 
equal afterward if and only if they are equal before (T6d,e). The ratio of 
the increase of c, which will later be called the relevance quotient (§ 66), 
is the same for both hypotheses (T6f). In T7, we make the additional as- 
sumption that ż is a possible new fact (o < c(i,e) < 1). T7 speaks about 
the absolute increase of ¢ (D: and Dz, respectively). T7a says that, in this 
situation, there are only two possible cases for a hypothesis; either its 
prior confirmation is o, then its posterior confirmation is likewise o, and 
hence the absolute increase is o; or the prior confirmation is positive, then 
the posterior confirmation is likewise positive and greater than the prior 
confirmation. T7c says this: if the prior confirmation is higher for k, than 
for h,, then the absolute increase is likewise higher for ka. T7d says that 
if the prior confirmations for the two hypotheses are equal, then the abso- 
lute increases of ¢ are equal too. 


§ 62. On Some Axiom Systems by Other Authors 


Some modern axiom systems for probability, (by Keynes, Waismann, Ma- 
zurkiewicz, Hosiasson, Jeffreys, Koopman, and Wright) are examined. It is’ 
found that their axioms, and hence also their theorems, hold for all regular 
c-functions. (Other theories, which contain the principle of indifference, are 
inconsistent; if this principle is omitted, then the remainder holds likewise for 
all c-functions.) A theory which holds for all regular c-functions and thus, in 
other words, admits all possible measure functions, is very weak; it comprises 
only a small part of inductive logic. Our task will be to construct the rest of 
inductive logic by narrowing the class of functions and finally selecting one 
of them. This will be done in later chapters. i 
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In this section we examine some theories of probability,, especially 
modern axiom systems, from the point of view of our inductive logic. Our 
chief aim is to find out whether these theories contain parts which go be- 
yond the theory of regular c-functions as represented in the earlier sec- 
tions of this chapter. 

The great merit of John Maynard Keynes’s work [Probab.] (1921), to 
which we have repeatedly referred, lies in his careful critical analysis of 
earlier conceptions and theories; the most important point is his criticism 
and rejection of the principle of indifference in its classical form, Further- 
more, his positive analyses and discussions contain valuable constructive 
contributions to the development of inductive logic. He has also made the 
first attempt to construct an axiom system for probability, ([Probab.], 
chap. xii); however, this formal part of his work is less satisfactory. Al- 
though symbolic logic is used, some points are not quite clear. As I under- 
stand the axioms and those of the so-called definitions which must be re- 
garded as axioms, they hold, as far as they apply to numerical values, for 
all regular c-functions with respect to a finite domain of individuals. 


Keynes’s inductive logic.is chiefly of a comparative form; only in cases of a 
special kind are numerical values attributed to probabilities. Since we want to 
examine here only the quantitative part of his theory, we take his axioms and 
definitions as referring to numbers. Then some of the axioms and definitions 
become purely arithmetical (e.g., the commutative principle of multiplication, 
and the like) and may therefore be left aside for the purpose of our comparison; 
these are Definitions XI, XII, and Axioms (iv), (v), and (vi). We may likewise 
leave aside the genuine, explicit definitions, i.e., those introducing a term in 
such a manner that it can be eliminated; this holds for Definitions I, VI, VI; 
VII, XII, XIV. (However, we have of course to make use of these definitions 
when their terms occur in axioms.) The remaining so-called Definitions are no 
definitions at all in the sense indicated ; they must rather be regarded as axioms; 
this holds for Definitions II, III, IV, V, IX, and X. 

We shall now compare the latter Definitions and the remaining Axioms with 
theorems which have been stated in this chapter, chiefly in § 59. We add the 
mark ‘(F)’ to those of our theorems which hold without restriction only for a 
finite domain of individuals. The result is as follows. Keynes’s Definition II 
corresponds to our theorems Ts9-1b and Ts9-5b (F); likewise, III to T59-re 
and Ts9-5c (F); IV to Ts9-sa (F); V to Ts59-5d (F); IX to T59-2a; X to T59-1n. 
Further, Keynes’s Axiom (i) corresponds to our T55-2b (we state this for ty 
only, but it could, in another system, be made valid for Re too); Axiom (ii) fol- 
lows from Ts9-th; Axiom (iii) consists of five parts, the first four of which are 
special cases of Ts9-1c. The fifth part of Axiom (iii) is somewhat obscure be- 
cause the term ‘equivalent’ occurs here in an absolute way, although it has been 
introduced by Definition VIII only as a term relative to a premise (evidence). 
Perhaps the term ‘equivalent’ is here meant in the sense of our ‘L-equivalent’; 
if so, the fifth part corresponds to our Ts9-sb (F) (in view of T20-21). 

We see that a number of Keynes’s principles hold only for a finite domain of 
individuals. This is the case especially for his identification of certainty with 
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probability x and of impossibility with probability o (Definitions II and III, 
see also op. cit., p. 128); compare the discussion above, following Ts9-s. It is 
not clear whether he had the intention to restrict his theory to finite domains 
or whether he believed that his principles held also in an infinite domain; the 
latter seems more likely because, when he speaks about the assumption that 
the number of independent qualities is not infinite (p. 256), he remarks that 
this assumption does not limit the number of entities or objects. In consequence 
of this restriction, whether intentional or not, many theorems and their proofs 
are considerably simpler than would otherwise be the case. 

Thus the result of the comparison is this: Keynes’s definitions and axioms, 
and therefore likewise his theorems, hold for all regular c-functions with respect 
to a finite domain of individuals. 


As earlier mentioned (see § 55B), Friedrich Waismann ({[Wahrsch.] 
1931) defines probability with the help of measures assigned to sentences 
(or propositions, ““Aussagen”) . For the measure of. sentences, Waismann 
lays down three requirements (op. cit., p. 236); they correspond to our 
Ts57-1g (first part: the measure is a nonnegative real number), D55-2b 
(or T'57-rb), T57-1m. He then defines the probability of one sentence 
with respect to another as in our Ds5-3; here, the requirement should be 
added that the measure of the evidence is not o, because otherwise the 
quotient has no value. Waismann says that, on the basis described, 
Keynes’s axioms can be proved and hence all theorems of “the Calculus 
of Probability” (op. cit., p. 239). However, this obviously does not hold 
for all theorems of the classical theory of probability and, in particular, 
not for those which are proved with the help of the principle of indif- 
ference, but only for those which hold for all regular c-functions. It seems 
that Waismann regards only this part of what we call inductive logic as 
belonging to logic; the determination of the measure function is, in his 
view, not the task of logic but is to be made on the basis of statistical ex- 
perience (op. cit., p. 242). 

Stefan Mazurkiewicz ([Axiomatik] 1932) constructs an axiom system 
for probability. Janina Hosiasson ([Confirmation] 1940, [Induction] 1941) 
adopts Mazurkiewicz’ four axioms but formulates them in terms of degree 
of confirmation. She leaves open the question whether degree of confirma- 
tion is the same as probability ([Induction], p. 354). However, it seems 
clear that both authors have in mind the same concept of probability;. 
The four axioms I, II, III, and IV of these two authors correspond to our 
theorems T5ọ-1b, l, n, and h, respectively. Therefore all theorems deriv- 
able from their axioms hold likewise in our theory of regular c-functions. 
In the two articles mentioned, Hosiasson derives a number of theorems 
and uses them for interesting discussions concerning the degree of confir- 
mation. They include those corresponding to our theorems T59-3a, T61-1¢ 
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and d, T6r1-3f, i, k, and 1, T6r-sd, T61-7c. Most of them concern the 
confirmation of a hypothesis by the observation of an event which is 
predictable with the help of the hypothesis (the condition of predictabil- 
ity is taken in the narrower, deductive sense). 

Harold Jeffreys’ work ([Probab.], 1939) is especially valuable in his 
extensive application of the theory of probability, or inductive logic to 
mathematical problems in statistics. But it contains also more general 
discussions which (if we leave aside some negative remarks concerning 
the frequency conception of probability, see above, § 9) contribute posi- 
tively to the clarification of the nature of inductive logic and its role 
within the method of science. It is to be desired that these discussions 
find as much attention among logicians and authors on scientific method 
as they deserve. The axiom system for probability which Jeffreys con- 
structs (op. cit., chap. i) begins with a comparative concept of proba- ~- 
bility.: “given p, g is more probable than r”, where p, q, and r are proposi- 
tions. Later, real numbers are assigned to the probabilities by certain 
tules, which are called “conventions” in order to emphasize their nature 
as inessential, logically arbitrary stipulations in distinction to the axioms 
proper. In this way, a quantitative inductive logic is constructed on the 
basis of the original comparative one. The advantage of this procedure lies 
in the fact that it shows clearly which of the theorems are based merely 
on the original, purely comparative assumptions; this is analogous to con- 
structing an axiom system of geometry by first laying down a system of 
topology and then strengthening it by additional axioms to a system of 
metrical geometry. Now let us compare Jeffreys’ system with the theory 
of regular c-functions. His system contains certain parts which are due to 
the particular procedure just described and which would become super- 
fluous if the system were constructed from the beginning as a system of 
quantitative probability, ; these items are Axioms I, 2, 4, and 5, and Con- 
vention 1 (Axiom 4 becomes superfluous because of Convention 2). 
There are no analogues to these items in the theory of regular c-functions. 
The other items in Jeffreys’ system correspond to certain theorems in our 
theory in this way: his Convention 2 corresponds to our T59-11; his Axiom 
3 yields, together with Convention 2, our Tso-1e, and, together with 
Convention 3, our Ts9-1b; Axiom 6 corresponds to Ts59-2h. A number of 
the theorems which we have stated in the earlier sections of this chapter 
have been proved by Jeffreys on the basis of his axioms and conventions; 
among them especially T59-2a, e, g, T59-3b, T6r-2. 

There are two points in Jeffreys’ axiom system where, it seems to me, 
corrections are necessary; the first is inessential, the second essential. 
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1. Jeffreys’ axioms and conventions do not contain any restricting con- 
dition for the statements of evidence (although Keynes had already ex- 
cluded impossible (logically self-contradictory) propositions (correspond- 
ing to L-false sentences) as evidence). Consequently, a contradiction is 
derivable from Axiom 3 and Conventions 2 and 3 as follows. p« ~$ en- 
tails both p and ~p. [This holds certainly if ‘entailment’ is used either 
in the sense of Lewis’ ‘strict implication’ or in the sense of our ‘L-impli- 
cation’. And it seems that it holds likewise in Jeffreys’ sense, because I 
understand the footnote on page 17 (op. cit.) as indicating that Jeffreys 
intends to use the term in such a way that a conjunction entails each of 
its components.] Therefore, Convention 3 (“If p entails q, then P(g] p) = 
1”, p. 21) and Theorem 2 (“If p entails ~g, then P(q|p) = 0”, p. 20), 
which is based on Axiom 3 and Convention 2, lead to the results that the 
probability of p on basis p. ~p is both 1 and o. This defect is of course 
inessential; it can easily be removed without diminishing the intended 
power of the system; all that is needed is the exclusion of self-contradic- 
tory propositions as evidence. Perhaps the author tacitly intended this 
restriction to be imposed. 

2. The second point is more important. Convention 1 says: “We assign 
the larger number on given data to the more probable proposition (and 
therefore equal numbers to equally probable propositions)” (op. cit., 
p. 19). Let us examine the second part, the one included in parentheses. 
It says obviously no more than this: “If p and q are equally probable on 
evidence r (in the sense of the comparative, not yet numerical concept of 
equality of probability), then equal numbers are to be assigned to p and q 
as their probability values on evidence r”. In particular, it does not say 
anything at all as to the conditions for p, q, and r under which we are to 
regard p and q as equally probable on evidence r (in the comparative 
sense); nor are these conditions stated anywhere else in the system. ` 
Therefore the rule mentioned can never be applied to any particular in- 
stances in the system. However, later in the book (p. 34), the author in- 
terprets Convention 1 surprisingly in such a way that the principle of 
indifference (Laplace’s principle of insufficient reason) is “an immediate 
application” of it; and, moreover, this principle is taken in a rather strong 
sense: “If there is no reason to believe one hypothesis rather than another, 
the probabilities are equal”. And the principle in this strong sense, al- 
legedly derived from Convention 1, is then used in proofs of theorems 
(e.g., p. 104 for (2); p. 111, in 3.23 for (x); p- 193 for (1)). We shall see 
later (in Vol. II) that the principle of indifference in the general form 
leads to contradictions. Thus Convention 1, not in the sense expressed in 
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its clear and simple wording, but in the sense in which it is interpreted 
and used by the author, makes the system inconsistent. This contradic- 
tion is essential to the system; if we remove its source, the principle of 
indifference, we deprive the system of some important results. If we take 
the system without the principle of indifference, then all its axioms and 
theorems hold for all regular c-functions, as our earlier comparison shows, 

B. O. Koopman has constructed an axiom system for a comparative 
concept of probability, ([Axioms], 1940). It seems that his primitive con- 
cept “h on the presumption that e is true is equally or less probable than 
h’ on the presumption that e’ is true”, if interpreted in quantitative terms, 
corresponds to ‘c(%,e) < c(h’,e’)’. Interpreted in this way, his axioms hold 
for all regular c-functions. This will be shown later by a comparison with 
our system of comparative inductive logic (§ 83B). 

Georg Henrik von Wright lays down six axioms for probability ({In- 
duction], 1941, pp. 106 f.). The axioms A,, Aa, As, As, and As correspond 
to our theorems Ts59-1a, b, e, n, and k, respectively. A, says: “For a given 
h and a given a the expression P(a|h) has one and only one value”. 
Here, however, the restricting condition should be inserted that h is not 
L-false; otherwise A, and A, lead to a contradiction in the same way as 
explained above in connection with Jeffreys’ Axiom 3. In a later work 
([Wahrsch.], 1945), Wright constructs an axiom system which, according 
to his intention, should be interpretable in terms of both probability, 
(“relative frequency”) and probability, (“inductive probability”, “re- 
liability of predictions”). His Axiom I states the uniqueness of the prob- 
ability value (as A, in the former system). Axioms II, III, IV, V corre- 
spond to the following of our theorems, respectively: part of T59-1a 
(c 2 o), T59-1b, p, n. Axioms VI and VII correspond to special cases of 
T5ọo-1i and h, respectively. 

We have examined only modern axiom systems of probability,. The 
question may be raised as to the relation between these systems and the 
classical theory of probability. Is the classical theory not essentially 
stronger than the modern systems? It is hardly possible to describe the 
structure and strength of the classical theory in precise terms, because its 
formulations fall far short of the standards of exactness of modern logic. 
Furthermore, the formulations by different authors vary to some extent. 
However, it seems that an examination of the principles and the procedure 
of the classical theory would yield, on the whole, the following result. 

At the first look, the classical theory seems to be much stronger than 
the modern axiom systems; in other words, stronger than a mere theory 
of regular c-functions. And we find indeed many stronger theorems stated 
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by classical authors. However, a closer examination shows that the proofs 
for these stronger theorems make use, explicitly or implicitly, of the prin- 
ciple of indifference. The classical theory claims to give a definition for 
probability, based on the concept of equipossible cases. The only rule 
given for the application of the latter concept is the principle of indiffer- 
ence; since we know today that this principle leads to a contradiction, 
there is in fact no definition for the concept of equipossibility. In order to 
base the classical theory on a consistent foundation, we may proceed in 
the following way. We regard it, not as an interpreted theory as it was in- 
tended, but as an uninterpreted axiom system with ‘equipossible cases’ as 
undefined, primitive term without interpretation. Then we take the 
classical definition of ‘probability’ based on ‘equipossible.’ Thus this defi- 
nition is here an uninterpreted axiomatic definition. If we do so (and, in 
addition, make some other necessary modifications, e.g., by inserting ref- 
ences to evidence, which are often omitted in classical formulations), then 
we obtain a consistent axiom system. This axiom system, however, is as 
weak as the modern systems described above; it holds likewise for all 
regular c-functions. 

Our discussion in this section leaves aside those axiom systems for 
probability which are intended to be interpreted in terms of the frequency 
concept of probability. These systems are different from those for prob- 
ability, not only in the interpretations intended but also in their logical 
form aside from all interpretations. The chief difference lies in the logical 
type of the arguments. The arguments of probability. are properties 
(classes); those of probability, are propositions or sentences. The primi- 
tive term in an axiom system for probability., say, ‘probability’ or ‘P’, 
can, of course, be interpreted in many different ways, not only by the con- 
cept of probability,. However, this term cannot be interpreted directly by 
probability, because of the type distinction mentioned. This holds as 
long as we take either sentences or propositions as arguments of proba- 
bility,; if we took as arguments the ranges of the sentences, hence classes 
of a certain kind, then this modified concept of probability, could be taken 
as interpretation for ‘P’. All axiom systems for probability., if thus inter- 
preted in terms of probability:, are as weak as the systems for probability; 
discussed above; their axioms and theorems hold for all regular c-func- 
tions. Axiom systems for probability, have been given, among others, by 
the following authors: S. Bernstein (1917, in Russian, see Kolmogoroff 
[Wahrsch.]), Reichenbach ({Axiomatik], [Wahrsch.] §§ 12-14), Kolmo- 
goroff [Wahrsch.], Dérge [Axiomatisierung], Evans and Kleene [Probab.], 
Cramér ([Statistics], pp. 145). The system by Copeland [Postulates] 
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presumably belongs here too, but the formulation is not quite clear in 
this respect; that the system is intended for probability, seems likely be- 
cause the author maintains the frequency conception of probabiļity (see 
[Fundamental]). Mises’ theory of probability is not an uninterpreted 
axiom system but an interpreted theory based on an explicit definition of 
probability.; what he calls axioms are actually parts of this definition. 


Let us sum up the result of our.discussion. We have examined the axiom 


systems for probability, constructed by Keynes, Waismann, Mazur- 
kiewicz, Hosiasson, Jeffreys, Koopman, and Wright. We have found that 
the axioms of these systems correspond to or are immediate consequences 
of theorems on regular c-functions stated in this chapter. Therefore, these 


axioms, and hence likewise all theorems provable in these axiom systems, 


hold for all regular c-functions. Other theories (e.g., the classical theory 
and Jeffreys’ system not as formulated but as used by its author) contain 
the principle of indifference; if we omit this principle, because it leads to 
contradictions, then the remainder holds likewise for all regular c-func- 
tions. 

What follows from this result? If the axioms of a system hold for all 
regular c-functions, then that system represents only a very small part 
of the theory of probability,. This part, it is true, is of great importance 
because it contains the fundamental relations between c-values. But its 
weakness becomes apparent from the following facts. Let e and h be 
factual sentences in Qy such that e L-implies neither 4 nor ~h. Then a 
theory of the kind mentioned does not determine the value of c(h,e). 
Moreover, it does not even impose any restricting conditions upon this 
value; the assignment of any arbitrarily chosen real number between o 
and 1 is compatible with the theory. We have seen this earlier, in connec- 
tion with T59-5f. A theory of this kind states merely relations between 
c-values; thus, if some c-values are given, others can be computed with 
the help of the theorems. There is an analogous restriction in the theory 
of probability.; however, here the restriction is necessary. The statement 
of a particular value of probability, for two given properties is, in general, 
a factual statement (§ 10B). Therefore, a logicomathematical theory of 
probability, cannot yield statements of this kind but must restrict itself to 
stating relations between probability, values. On the other hand, in the 
case of a theory of probability,, there is no reason for this restriction. A 
sentence of the form ‘c(h,e) = 7’ is not factual but L-determinate. There- 
fore, a logicomathematical theory of probability,, in other words, a sys- 
tem of inductive logic, can state sentences of this form. The fact that the 
axiom systems for probability, restrict themselves to statements which 
hold for all regular c-functions makes these systems unnecessarily weak. 
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Thus it becomes clear what our task is to be if we want to construct a 
system of inductive logic that can furnish the answer to inductive prob- 
lems and enables us, among other things, to compute the value of ¢ for 
given sentences. The theory of regular c-functions dealt with in this chap- 
ter is no more than the first step, We have to lay down further require- 
ments in addition to regularity, and thereby finally come to the selection 
of one particular c-function. The additional requirements are to achieve 
what the classical theory intended to achieve by the principle of indiffer- 
ence; they must, however, avoid the absurd results which have been de- 
rived with the help of this principle and the contradictions which can be 
derived. Moreover—and this involves a more serious problem—the c-func- 
tion to which the requirements lead must be an adequate explicatum for 
probability,. : 


§ 65 


CHAPTER VI 
RELEVANCE AND IRRELEVANCE 


The theory of relevance and irrelevance deals chiefly with the following situ- 
ation, which has been briefly discussed earlier (§ 60). On the basis of prior evi- 
dence e, a hypothesis + is considered, and the change in the confirmation of 4 due 
to an additional evidence 7 is examined. If the ¢ of 4 is increased by the addition 
of i to e, 7 is said to be positively relevant or positive to h on the evidence eṣ if ¢ 
is decreased, i is said to be negative (to h on e). In these cases i is called relevant 
(to k on e), otherwise irrelevant (§ 65). 

These relevance concepts can be represented in various ways by numerical 
functions of triples of sentences i, h, e. One of these functions is the relevance 
quotient c(/,e «7)/c(h,e). It is clear that i is positive, negative, or irrelevant to 
h on e, if the relevance quotient is >1, <1, or 1, respectively. Theorems on 
this quotient, developed by W. E. Johnson:and Keynes are briefly reported 
here (§ 66); but this concept is not used further on. 

A new numerical function, the relevance measure r, is introduced (§ 67) and 
then applied as the fundamental concept of the theory of relevance developed 
in this chapter. r(i,h,e) is defined on the basis of m-values. It is shown that is 
positive, negative, or irrelevant to h on e, if r(i,h,e) is positive, negative, or o, 
respectively. If i is replaced by ~i or h by ~h, r changes to the opposite value. 
Some of the chief problems here discussed and solved concern the relations be- 
tween the relevance of two new observations i and j (to % on e) and the rele- 
vance of their connections, especially ¿ .j and i Vj (§§ 68, 69); further, the 
relations between the relevance of i to two hypotheses 4 and k (on e) and the 
relevance of 7 to their connections, especially to k . k and h V k (§§ 70, 71). It 
is found that r is additive in two respects (§ 68): (1) the r of a disjunction with 
L-exclusive components (to / on e) is the sum of the r-values for the com- 
ponents; (2) the r of a conjunction with L-disjunct components is the sum of 
the t-values for the components. (1) can be applied, in particular, to the ulti- 
mate disjunctive components of i; these are the state-descriptions 3 in the 
range R(t). Thus, the r of i (to k on e) is the sum of the r-values for these 8 
(§ 72). If i is positive to # on e and none of these 3 is negative, i is said to be 
extremely positive to h on e (§ 74). The second additivity (2) can be applied 
especially to the ultimate conjunctive components of i; these are the negations 
of the 3 in R(~i); we call them the content-elements of i. Thus the t of i (to k 
on e) is the sum of the r-values for the content-elements of i (§ 73). If i is posi- 
tive to # on e and none of the content-elements of i is negative, i is said to be 
completely positive to # on e (§ 75). 


The Concepts of Relevance and Irrelevance 


Suppose the new evidence ¢ is added to the prior evidence e for a hypothesis h. 
If the posterior confirmation ¢(#,¢.i) is higher than the prior confirmation 
c(h,e), tis said to be positive to hon the evidence e (Dra). If it is lower, i is said to be 
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negative (Db). In both cases, å is called relevant (Dc). If the c-values are equal 
or if e «i is L-false (in which case ¢(h,e «i) has no value), 7 is called irrelevant. 
The most interesting among the theorems answer the questions how relevance 
and irrelevance are influenced by exchanging i and + and by negating either or 
both of them. Among the answers are the theorems of symmetry: if 7 is positive 
to k on e, then h is positive to ¢ on e (T6a); similarly for negative relevance 
(T6b) and irrelevance (T6d). If z is positive to h on e, then ~i is negative to 
h on e (T6e) and i is negative to ~h on e (T6h(z), (6)). If. is irrelevant to 
h on e, then likewise ~i to h on e (T6g) and i to ~h on e (T6k). The special 
multiplication theorem (T6l) says that, if 7 is irrelevant to % on e, then the 
c-value for A «i (on e) is the product of those for # and for i. Initial relevance is 
defined as relevance on the tautological evidence (D2). 


The situation which we shall study in this section, and indeed through- 
out this chapter, is essentially the same as that discussed in § 60: an ob- 
server X is interested in a hypothesis 4; he possesses some prior evidence e 
and obtains now additional evidence z or considers the possibility of ob- 
taining it. The chief question to be investigated is, how the c of h is in- 
fluenced by the addition of 7 to e. If the posterior confirmation c(h,e « 7) 
is higher than the prior confirmation c(,e), we shall say that the addi- 
tional evidence i is positively relevant or, simply, positive to the hy- 
pothesis % on the evidence e (Dra). If it is lower, we shall say that zis nega- 
tively relevant or negative to h on e (Dib). If the c of h remains un- 
changed, and also in another case, where c cannot be applied, we shall say 
that 7 is irrelevant to k on e (D1d). Here, as in § 60, the definitions and 
theorems are formulated in a general way for any sentences 7, h, and e; but 
they are especially of interest when applied to situations of the kind de- 
scribed, 

These simple, nonnumerical relevance concepts will be discussed in 
this section. The subsequent sections of this chapter will investigate the 
same situation with the help of relevance functions which ascribe a 
numerical value to a triple of sentences 3, h, e. This will be done in the 
next section by using the relevance quotient, which has been introduced 
and studied by W. E. Johnson and Keynes, and in the remainder of this 
chapter with the help of a new function, which we shall call the relevance 
measure r. ; 

The investigation of the problems of relevance and irrelevance in this 
chapter form a part of the general theory of regular c-functions, which was 
begun in the preceding chapter. Only later shall we restrict the considera- 
tion to a special subclass of the regular c-functions (chap. viii), and still 
later to one particular c-function (§ 110). Thus the results to be found in 
the present chapter hold generally, no matter which particular c-function 
anybody may prefer as an explicatum for the concept of degree of con- 
firmation. ; 
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+D65-1, Let 2 be any finite or infinite system, c be a regular c-function 
in £, and h, e, and be sentences in Q. 

a. i is positively relevant or, briefly, positive to h on evidence e (with 
respect to cin £) = ps c(h,e.%) > c(h,e). 

b. zis negatively relevant or, briefly, negative to h on evidence e (with 
respect to ¢ in £) =p; c(h,e . 1) < c(h,e). 

c. zis relevant to h on evidence e (with respect toc in £) = ps iis either 
positively relevant or negatively relevant to i on evidence e. 

d. 7 is irrelevant to h on evidence e (with respect to c in £) = ps either 
(1) c(h,e . i) = c(h,e), or (2) e. i is L-false. 


These concepts (with the term ‘favourably relevant’ instead of ‘positively 
relevant’) and some of the subsequent theorems (Téa, d, g) are due to 
Keynes ([Probab.], pp. 55, 146 f.). For the definition of irrelevance, he 
uses only the condition (d) (1). He suggests also another, stronger defini- 
tion for irrelevance, which we shall discuss later (at the end of § 75). We 
add the condition (2) in (d), because it turns out that hereby the theorems 
on irrelevance become simpler. If e.t is L-false, c(k,e . í) has no value; 
if e.t is not L-false, then in Qy both c(h,e.2) and c(h,e) have values. 
Thus the addition of (2) has the effect that in €y any sentence 7 is either 
relevant or irrelevant to h on e (T4g). 


Let us briefly indicate some other consequences of the addition of the condi- 
tion (2), anticipating later explanations and discussions. The criterion for irrele- 
vance in terms of m-values becomes simpler; it holds in Qy without exceptions 
(T4f). Further, due to (2), the theorem of the symmetry of irrelevance holds in 
£y without exceptions (T6d). Take the following example in Qy: let e.h be 
L-false; then c(%,e a$) = c(h,e) = 0; thus i is irrelevant to k% on e, and (by 
Drd(2)) his irrelevant to i on e. On the other hand, since ¢(i,e « 4) has no value, 
7 is not irrelevant in the narrower sense (d)(z) to # on e; if (2) were omitted, i 
would be said to be neither relevant nor irrelevant to on e. 

Another simplification effected by (2) is the strict parallelism in &y between 
the relevance concepts defined by Dı and the relevance measure r (D67-1, 
167-10). If (2) were omitted, we should have to say: ‘r(i,h,e) = o if and only 
if i is irrelevant to h on e or e «i is L-false.’ We shall find that the numerical 
concept t is mote fruitful, leads to more simple and interesting theorems, than 
the simple relevance concepts. Therefore it seems preferable to adjust the lat- 
ter concepts to r rather than vice versa. 

It is to be noticed that for general sentences in Qo and, in particular, for 
almost L-false sentences, a gap between relevance and irrelevance and hence a 
discrepancy between the relevance concepts and r remains. Let e «i be almost 
L-false (D58-rb); then e «i is not L-false but m(e i) = o (T's8-3a) and hence 
m(e a i « k) = o. We shall see that in this case r = o but that nevertheless the 
following may happen: c(h,e «i) has a value and is greater than c(h,e) (see the 
example following T67-9), hence i positive to k on e. This shows also that the 
condition (2) cannot be replaced by the condition that m(e «i) = o. 

An alternative explication for the relevance concepts will here be indicated 
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briefly. Consider the case in Qo that } e.i D k, hence c(k,e i) = 1; further, 
not } e D h, but, for a given c, nevertheless c(h,e) = 1. In other words, e almost 
L-implies k, e « ~h is almost L-false (Ds58-1). Thus ¢ is not increased by the 
addition of i to e. Therefore, according to Dz, is called irrelevant to # on e. On 
the other hand, the addition of i changes our knowledge concerning k; before 
this addition, # was only almost certain (i.e., almost L-implied by the evidence 
e available); after the addition, + is certain (i.e., L-implied by the available 
evidence e »1). [Example. Let h be ‘(qx)Px’ and i be ‘Pb’ in lo. As prior evi- 
dence e we take the tautology ‘/’. With respect to many adequate c-functions, 
among them our function c* to be introduced later, k% is almost L-true; hence 
c(h,t) = c(h,t a i) = 1. On the evidence ‘?’, k is not certain, although almost 
certain; that is to say, X cannot know whether 4 holds or not. However, as soon 
as X makes the observation that b is P, 4 follows. Thus his knowledge concern- 
ing + has clearly changed in a favorable way, although this change is not repre- 
sented by an increase in the value of ¢.] It might not seem implausible to call i 
in this case positively relevant instead of irrelevant. This suggests the following 
alternative to Dra: 

(D’) i is positive to h on e =p; either (1) c(h,e«i) > c(h,e) or (2) Fesi Dh 

and not } e D A and c(h,e) = c(h,e aî) = 1. 

Analogously, the definition of negative relevance would be modified so as to in- 
clude the case in which c(h,e « i) = c(he) = o, }e «i D’~h but not fe D ~h 
(hence e.h is almost L-false). And the definition of irrelevance would be 
made narrower by excluding the two cases. This alternative definition coin- 
cides with Dı as far as any sentences in &y and nongeneral sentences in æo 
are concerned; it differs from Dr only in some special cases of general sentences 
in £o (where either e. ~h or e a k is almost L-false). Most of the theorems in 
this chapter exclude this special case and thus would remain unchanged if Dı 
were replaced by the alternative definitions. In a few theorems, slight changes 
would have to be made. For instance, T67-8 and T67-10 would remain un- 
changed. T67-9 contains the condition that e «7 is not almost L-false; here we 
would have to insert in (a) the additional condition that e » ~% is not almost 
L-false, in (b) that e « 4 is not almost L-false, and in (c) and (d) both. Similar 
additions would be made in T65-4 and some other theorems. 


In the following theorems it is always presupposed that m is any regu- 
lar m-function in the system € in question, that ¢ is the regular c-function 
corresponding to m, and that the relevance concepts are meant with 
respect to this c. ; 

We shall use throughout this chapter the following abbreviations ‘k,’, 
etc., for certain conjunctions, and ‘m,’, etc., for their m-values: 


n Bn Mn =I (kn) 
I kz: esish m 
2 ka: € ni e~h Ma 
3 ky: eais h m; 
4 ki esmis ~h LA 


The m-values for other sentences which we need can then easily be 
represented as sums of these m-values: 
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T65-1. Lemma. 

a. m(e.7) = m, + m. 

b. m(e. ~i) = m, + m, 

c. m(e. h) = m, +m. 

d. m(e. ~h) = m, + m, 

e. m(e) = m: + m, + m, + m, 

(From T57-1r.) 

T65-2. Let e’ be L-equivalent to e; let i’ be L-equivalent (or L-equiva- 
lent with respect to e) to i, and likewise h’ to h. If ż is positive (or nega- 
tive, or relevant, or irrelevant, respectively) to % on e, then so is 7’ to h’ 
on e’. (From T59-1h and i, T59-2).) 

T65-3. If 7 is relevant to % on e, then e.i and e are not L-false. (From 
Dic, b, a, T55-2b, T56-4b.) 

In the following theorem T4, the items (c), (d), (e), and (f) give suf- 
, cient and necessary conditions for the four concepts defined in Dr. Con- 
cerning condition (B) in T4 and T6: to say of a sentence which has an 
m-value that it is not almost L-false means that either it is L-false or its 
m-value is > o (T58-3a). 

+T65-4. Let e, k, and i be either (i) any sentences in @y, or (ii) any 
nongeneral sentences in lo, or (iii) any sentences in £o fulfilling the follow- 
ing two conditions: (A) m has values for k,, ka, k,, and k, (and hence for 
e, e.h, e.i, and €.i.h, Tx) and (B) neither e nor e . i is almost L-false. 
Then the following holds. 

a, Lemma. Either e is L-false or 

c(h,e) = m(e. h)/m/(e). 

Proof. If e is not L-false, m(e) > o (for 8y, from Ts8-1a; for lœ, from 
(B), T58-3a). Hence the assertion (for £y, from D55-3; for lœ, from T56-4a). 

b. Lemma. Either e.7 is L-false or 

c(h,e 7) = m(e „i a h)/m(e « i). (Analogous to (a).) 
c. 2 is positive to h on e 
(x) if and only if m(e.i.h) X m(e) > m(e.h) X m(e. i); 
(2) if and only if m, X m, > m, X m;. 

Proof. 1:1. Let e «i be L-false. Then c(h,e »i) has no value and hence # is not 
positive. m(e«i) = o, and hence m(e «is h) = o (T57-1s); thus the condition 
in (1) is not fulfilled. II. Let e.i not be L-false. Then e is not L-false, and 
Hee a and m(e) are >o. Hence the assertion (1) by (a) and (b). 2. From 

d. t is negative to k on e 

(x) if and only if m(e «i. k) X m(e) < m(e«h) X m(e . i); 
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(2) if and only if m, X m, < m, X m;. 
(Analogous to (c).) 
e. zis relevant to h on e 
(1) if and only if m(e «i. h) X m(e) = m(e.h) X mle. i); 
(2) if and only if m, X m, # m, X m, 
(From (c), (d).) 
f. iis irrelevant to h on e 
(x) if and only if m(e ai. k) X m(e) = m(e.h) X m(e . i); 
(2) if and only if m, X m, = m, X m;. 
Proof. 1. I. Let e «i be L-false. Then ¢ is irrelevant (D1d(2)). m(e i) = 0, 
and hence m(e «i» h) = o (Ts7-18). Therefore the condition in (x) is fulfilled. 
II. Let e «i not be L-false. Then e is not L-false, and m(e «i) and m(e) are >o. 
Hence the assertion (1) by D1d(z), (a), (b). 2. From (1), Tr. 


g. iis either relevant or irrelevant to # on e. (From (e), (f).) 


T65-5. Let e, k, and be either (i) any sentences in &y, or (ii) any non- 

general sentences in fo. 

a. Let m’ and m” be regular m-functions, c’ be based upon m’ and c” 
upon m” such that 7 is positive to h on e with respect to c’ but nega- 
tive with respect to c”. Then, for every regular m-function m, mu, 
ma, m, and m, are >o, and all four k-sentences are non-L-false. 

Proof. mi X m, > m, X m; (T4c(2)). Hence mi and m; are >o. Therefore 
kı and k, are non-L-false (T57-1b). Hence, for every regular m, m: and my, 
are >o (T58-1a). my’ X my < mt’ X my’ (T4d(2)). Hence m,’ and m,’ are >o. 
Therefore ka and k, are non-L-false. Hence, for every regular m, ma and m; 
are >o. 

b. Let the conditions in (a) be fulfilled. Then there is a regular c-func- 
tion c such that @ is irrelevant to h on e with respect to c. 


Proof. The four k-sentences are L-exclusive in pairs and non-L-false (a). 
Their disjunction is L-equivalent to e. Therefore we can construct a regular 
m-function m such that m: = m: = m; = m4 = m(e)/4. Let ¢ be based upon m. 
Then å is irrelevant to A on e with respect to ¢ (T4f). 


c. Let i be relevant to k on e with respect to every regular c-function. 
Then 7 is either positive with respect to every regular c-function or 
negative with respect to every regular c-function. (From (b).) 

The following theorem shows how relevance and irrelevance are in- 

fluenced by exchanging i and # and by negating either or both of them. 
+T65-6. Let e, k, and i be either (i) any sentences in &y, or (ii) any 
nongeneral sentences in lœ, or (iii) any sentences in £o fulfilling the fol- 
lowing two conditions: (A) m has values for hy, ka, ky, and k, (and hence 
for e, e.i, e. ~i, e.h, e. ~h, T1); (B) none of the sentences e, €. t, 
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e. ~i, e.h, and e. ~h is almost L-false (hence each of them either 
is L-false or its m-value is >o). Then the following holds. 

a. Symmetry of positive relevance. If iis positive to 4 on e, then % is posi- 

tive toi one. 
Proof. The condition T4c(1) remains the same if i and h are exchanged, 
since m(e » k „ i) = m(e «i «a h). 

Keynes remarks here: “This constitutes a formal demonstration of the 
generally accepted principle that, if a hypothesis helps to explain a phe- 
nomenon, the fact of the phenomenon supports the reality of the hypoth- 
esis” ([Probab.], p. 147). 


b. Symmetry of negative relevance. If i is negative to / on e, then h is 
negative to 7 on e. (From T4d(z).) 

c. Symmetry of relevance. If i is relevant to h on e, then h is relevant to 
ion e. (From (a), (b).) 

d. Symmetry of irrelevance. If i is irrelevant to h on e, then 4 is irrele- 
vant to z on e. (From T4f(r).) 

e. If 7 is positive to h on e, then ~i is negative to % on e. 

Proof. m(e siah) X mles mi. ~h) > mesia ~h) X mle. mi. h) (T4c 


(2)). Hence m(e. ~i. h) X me eis ~h) < mew wis ~h) X mle «i sh). 
Hence the assertion (T4d(2) with ‘~i? for 7’). 


f. Tf iż is negative to h on e, then ~i is positive to % on e. (From T4d(2) 
and T4c(2), in analogy to (e).) 

g. If i is irrelevant to h on e, then so is ~i. (From T4f(2), in analogy 
to (e).) 

h. The following eight conditions are logically equivalent to one an- 
other, that is to say, if any one of them holds, all others hold too. 
(x) żis positive to h on e. 

ı (2) his positive to ion e. 
(3) ~i is negative to h on e. 
(4) ~h is negative to i on e. 
(5) A is negative to ~i on e. 
(6) zis negative to ~h on e. 
(7) ~h is positive to ~i on e. 
(8) ~i is positive to ~h on e. 
Proof. (1) is logically equivalent to (2) (from (a)); likewise (1) to (3) (from 
(e), (f), and T2); (2) to (4) (from (e) and (£)); (3) to (5) (from (b)); (4) to (6) 
(from (b)); (5) to (7) (from (e) and (f)); (6) to (8) (from (e) and (f)). 


i. The following eight conditions are logically equivalent to one an- 
other. 
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(1) ~i is positive to k on e. 
(2) his positive to ~ż on e. 
(3) iis negative to % on e. 
(4) ~h is negative to ~t on e. 
(5) his negative to 7 on e. 
(6) ~i is negative to ~h on e. 
(7) ~h is positive to 7 on e. 
(8) zis positive to ~h on e. 
Proof. We change in (h) ‘i’ into ‘~i’. Thereby ‘~i’ is changed into ‘~~’, 
which may then be changed into ‘i? (T2). 

j. If any sentence in one of the two classes {i, ~i} and {h, ~h} is 
relevant on e to some sentence in the other class, then each sentence 
in either class is relevant on e to each sentence in the other class. 
(From (h) and (i).) 

k. If any sentence in one of the two classes {7, ~i} and {k, ~h} is 
irrelevant on e to some sentence in the other class, then each sen- 
tence in either class is irrelevant on e to each sentence in the other 
class. (From (d) and (g).) 

1. Special multiplication theorem. If i is irrelevant to h on e (and hence h 
irrelevant to i on e, (d)) and e is not L-false, then c(h « i,e) = 
c(i,e) X c(h,e). 

Proof. m(e) > o; hence the three c-values exist (as in T4a). If e «i is L-false, 


e.i. h is L-false; hence the first two c-values are o and the equation is ful- 
filled. If e.i is not L-false, c(%,e »%) = c(h,e) (D1d); hence the assertion by 


Ts59-1n(2). 


From T6h and i we see the following. Suppose a true statement is given 
saying that one sentence is positive (or negative) to another sentence on e, 
Then this statement remains true if we exchange the first and the second 
sentence; and likewise if we carry out any two of the following three 


changes: 

(z) in the first sentence we add or drop the sign of negation; 
(2) in the second sentence we add or drop the sign of negation; 
(3) we change ‘positive’ to ‘negative’ (or vice versa). 


The condition (B) in T6 requires for lœ that certain sentences be not almost 
L-false. In order to show the necessity of this restriction, let us consider the fol- 
lowing counterexample in {o with an almost L-false sentence e «k. Let h be 
the law ‘(x)Mz’, where ‘M’ is a factual molecular predicate. Let e be ‘Mar « 
Maz....«Ma,’; hence e describes a sample of s cases fulfilling the law +. 
ļ h D e; hence e«h is L-equivalent to h. Let j be ‘Mc’, where ‘c’ is an in not 
occurring in e. Then | k Dj, hence e(j,¢ « h) = 1and c(~jye « h) = o. (je) < 1 
(T59-5a); hence e(~j,e) > o (Ts9-1p). Therefore h is positive to j on e, and 
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negative to ~j on e. On the other hand, the following holds for the function 
c* to be introduced later (§ 110F (12)) and likewise for many other regular c- 
functions. ¢*(h,e) and c*(h,e »j) areo. Since} ~j D ~h, c(h,e « ~j) = o. There- 4 
fore j and ~j are irrelevant to h on e. Thus here positive relevance, negative — 
relevance, relevance, and irrelevance are not symmetric. The example would 
violate Téa, b, c, and d, if these theorems were formulated without restrictions. 
However, in this example m*(e . 4) = m*(k) = o. Therefore h, since it is fac- 
tual, is almost L-false (T58-3a); and the same holds for e » h. 


The following two theorems deal with two special cases of irrelevance, 


T65-7. Let e . h be L-false in 2; in other words, }e D ~h. (This holds 
in 8y, if, but not only if, c(#,e) = o.) Then the following holds. 
a. Lemma. If eis not L-false, c(k,e) = o. (From T59-1e.) 
b. Lemma. For any i such that e. zis not L-false, c(h,e. i) = o. (From 
T59-1€.) 
c. Every sentence is irrelevant to h on e. (From Dxd, (a), (b).) 


T7 says for Qy that if we once have the confirmation o for h, then this 
remains so no matter what additional evidence we may find. In this form 
the statement is restricted to £y; it does not hold generally in lo. It may 
happen in lo that c(%,e) = o and still e. 4 is not L-false but only almost 
L-false. It may then be that, to take a trivial example, c(%,e . h) has a 
value (which is, of course, impossible if e . % is L-false); if so, this value 
is ı (T59-1b), and hence 4 itself is positive to / on e. There are also non- 
trivial cases of a relevant sentence i, that is to say, cases in which @ is not 
so strong that }e.iD k. 

In order to construct a nontrivial example of this kind in Qo, we use again 
the sentences +, e, arid j of the example to T6 and our function c*. Let i be 
‘(x)[x æ ¢ D Mx)’; it says that M holds for all individuals distinct from c. We 
see easily that } i D e; further | k = i aj, and hence | h D i, but not feet DF 
and hence not }e.i D h. we*(h,e) is >o but converges with increasing M 
toward o; therefore in lo c*(h,e) = o. On the other hand, c*(h,e «i) = c*(h,t) 
(Ts9-th, because } i D e), = c¥(@j,i) (because | k= i +3); = *G,i) (Ts9-2I). 
Since not }i Dj, n*i) < 1; but it converges toward 1, and therefore in 
Ro c*(he wi) = c*(7,i) = 1. Thus 7 is positive to h on e, although c*(h,e) = o. 


The following theorem T8 is analogous to T7. 


' 165-8. Let e. ~h be L-false in Q ; in other words, }e D h. (This holds 
in Qy if c(h,e) = 1.) Then the following holds. 
a. Lemma. If e is not L-false, c(#,e) = 1. (From Ts9-zb.) 
b. Lemma. For any i such that e.ż is not L-false, c(h,e.7) = 1. 
(From T59-1b.) 


c. Every sentence is irrelevant to h on e. (From Drd, (a), (b).) 
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T8 says for ly that, once + is confirmed to the maximum degree 1, then 
this will not be changed by any additional evidence. This does not hold 
generally for Wo. 

To give a counterexample in lœ, we take again the function c* and the 
sentences h, j, and i of the former example. Then c*(h,i) = 1. Nevertheless, 
~j is negative to # on i because | 4 D j, hence | ~j D ~h, hence c*(h,i « ~j) 
= 0, 

T65-9. Let e. ~i be L-false in l; in other words, | e D 7. Let k be any 
sentence. For Qo, let either e be L-false or c(#,e) have a value. Then 7 is 
irrelevant to / on e. 

Proof. e «i is L-equivalent to e (T21-si(z)). If e «i is L-false, i is irrelevant 


(Dıd(2)). If e.i is not L-false, then e is not L-false, and c(h,e »i) = c(h,e); 
hence again 7 is irrelevant (D1d(r)). 


Relevance of a state-description 3; in tw: 


165-11, Let e be a sentence in y which holds in 8;. (Hence e is not L- 
false.) Let & be any sentence not L-implied by e and likewise holding in 3. 
a, 3, is positive to h on e. 
Proof. Since not | e D h, c(h,e) < 1 (T59-5a). | 8: D h and’t 3: D e (T20- 
at); hence e. 3; is L-equivalent to 3: (T21-5i(1)). Therefore c(h,e « 3:) = 
c(h,8:), = 1 (T59-1b). Hence the assertion. 


b. ~3; is negative to h on e. (From (a), T6e.) 


Later, after the introduction of the relevance measure r, we shall make 
a more detailed investigation of the relevance of state-descriptions and 
their negations (§§ 72, 73). 

We shall now introduce concepts of relevance and irrelevance which 
are analogous to those defined above but apply to the special case where 
the evidence e is tautological. This means that we judge the relevance ofi 
to h before any factual knowledge is available. If 7 is relevant to % on the 
tautological evidence ‘t’, we shall say that i is initially relevant to h. 
For the probability, on the evidence ‘’, classical authors used the term 
‘probability a priori’ and later authors Snitial probability’. We have used 
the terms ‘null confirmation’ and ‘initial confirmation’ (and the symbol 
‘c’, Ds7-1). For the present concept, the term ‘relevance a priori’ might 
be taken, but the term ‘initial relevance’ is probably less in danger of 
being misinterpreted. The concepts here defined will seldom be used in- 
the following. 


D665-2. Let c be a regular c-function in €, and let # and 7 be sentences 
in £, 


356 VI. RELEVANCE AND IRRELEVANCE 


a. iis initially positive to h (with respect to c in 2) = ps t is positive to h 
on evidence ‘’, 

b., 7 is initially negative to h (with respect to cin £) = psi is negative to 
h on evidence ‘?’. 

c. zis initially relevant to h (with respect to cin £) = ps iis relevant toh 
on evidence ‘t’. 

d. i is initially irrelevant to h (with respect to cin 2) = ps 7 is irrelevant 
to k on evidence ‘?’. 


The following theorem T13 is analogous to T4. The items (a) to (d) 
give convenient other forms of sufficient and necessary conditions for the 
four concepts introduced by D2. 


165-13. Let 4 and 7 be either (i) any sentences in Ìy, or (ii) any non- 
general sentences in x, or (iii) any sentences in £o fulfilling the follow- 
ing two conditions: (A) m has values fori.h, i. ~h, ~i.h,and ~i a ~h 
(and hence also for i and 4), and (B) ż is not almost L-false (hence either i 
is L-false or m(i) > o). Then the following holds. 

a. iis initially positive to h 

(x) if and only if c(h,é) > co(h); (from Dza, Dia, Ds7-1); 
(2) if and only if m(ż . 4) > m(i) X m(4). (From T4c(z).) 
b. iis initially negative to h 
(1) if and only if c(h,z) < (kh); 
(2) if and only if m(ż . 4) < m(i) X m(4). (Analogous to (a).) 
c. iis initially relevant to h 
(x) if and only if c(h,z) = c(h); 
(2) if and only if m(i . k) = m(é) X m(h). (From (a), (b).) 

d. iis initially irrelevant to h 

(1) if and only if c(h,i) = c(h) or i is L-false; 
(2) if and only if m(¢. h) = m() X m(h). (Analogous to (a).) 

e. t is initially either relevant or irrelevant to h. (From T4g.) 

65-14, Let % be L-false or L-true in £. Then every sentence is initially 
irrelevant to h. (From T7c, T8c.) 

T14 is analogous to T7 and T8. Analogues to the other previous theo- 
rems on relevance and irrelevance hold obviously here too; they are simply 
special cases with ‘’ taking the place of e. 


§ 66. The Relevance Quotient 


„The simple relevance quotient, symbolized by ‘{i,ia}.’, is defined as 
cline «12)/c(é,,e), hence as the quotient of the posterior and prior confirmations 
of i; (Dia). A related function for more arguments is defined analogously 
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(Drb). Among the theorems are the following. It is clear that 7 is positive, 
negative, or irrelevant to / on e if {hi}. is >1, <1, or =1, respectively (T2). 
The c of a conjunction is the product of the c-values for the conjunctive com- 
ponents and the corresponding relevance quotient (T3). The relevance quo- 
tient is commutative (T4). The definition and the theorems of this section are 
due to W. E. Johnson and Keynes. They will not be used further on in this book. 


In this and the subsequent sections we shall deal with numerical func- 
tions of three sentences. They belong to the kind of functions which might 
be called relevance functions, since for each of them the value for a triple 
of sentences 7, k, e is characteristic for 7 being positive or negative or 
irrelevant to / on e. In the present section we shall briefly explain a func- 
tion which W. E. Johnson and Keynes have discussed, but which will not 
be used further on in this book. In the next section we shall introduce a 
new relevance function. 

The relevance concepts defined in the preceding section have to do with 
the change in the confirmation of # when a new evidence 7 is added to the 
prior evidence e, that is, with the change from c(h,e) to c(%,e . i). 7 was 
called positive if the latter value was greater than the former. Now let us 
look for means which will make it possible not only to say that 7 is posi- 
tively relevant but, so to speak, to measure the positive relevance of 7. 
There are obviously two simple ways for doing so; we may either take the 
quotient c(h,e « i)/c(h,e) or the difference c(h,e. 7) — c(k,e). 7 is positively 
relevant if the quotient is >1, and also if the difference is >o. Thus both 
functions are relevance functions. The first of them, which we call the 
relevance quotient, will be dealt with in this section, following Keynes. 
Another function which is closely related to the difference will be intro- 
duced in the next section and used throughout the remainder of this 
chapter. 

Keynes ([Probab.], pp. 150-55) gives a definition of the relevance quo- 
tient, which he calls the coefficient of influence, and a series of theorems 
on this concept, based upon unpublished notes by W. E. Johnson. The 
following exposition follows Keynes in the main lines; we use, however, a 
slightly modified symbol and transfer the whole into our terminology and 
symbolism. 4 


D66-1, Let c be a regular c-function in the finite or infinite system R. 
Let e, i; ia, etc., be sentences in £ whose conjunction is not L-false, and 
h any sentence in &. We define recursively (with respect to ¢ in £): 

a. Simple relevance quotient. {i;,t2}¢ =pt Ce. 

b. Multiple relevance quotient (n = 2). [inia - + stngrde = Dilinin - + +5 


in-nin a Ingrbe X fimtatrhe 
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Keynes writes instead of (b) ‘{ix*i.*i,° . . . “inz:}’. Since all superscripts in 
any expression of this form are alike, it seems simpler to indicate the evidence 
only once. 


Thus we have, for instance, for three arguments: 
(hijle = [hinj] X fig}. = Meee? x Mee. 
In the following theorems we omit for brevity the reference ‘with respect 
to cin X. However, the relativity with respect to ¢ must be kept in mind. 


We cannot determine the numerical value of a relevance quotient unless 
we choose a specific c-function. 


T66-1. Let e’ be L-equivalent to e, and for every p = 1 tom (m = 2), 
let 7, be L-equivalent to ip with respect to e. Then {71,23,...,in}o = 
{inia .. » jtn}e. (From Dr, T59-th, T59-2j.) 


+T66-2. 
a. i is positive to h on evidence e if and only if {h,i}. > 1. (From 
Dra, D65-1a.) 
b. 7 is negative to h on e if and only if {h,i}. < 1. (From Dra, D65-1b.) 
c. tis relevant to hon eif and only if {h,i}. = 1. (From D65-1¢, (a), (b).) 
d. 7 is irrelevant to h on e if and only if either (1) {h,i}. = 1, or 
(2) c(h,e. i) = c(h,e) = 0, or 
(3) e. iis L-false. 
(From D65-1d, Dra.) (In case (2) the relevance quotient has no 
value; in case (3), c(#,e.7), and hence the relevance quotient too, 
has no value.) 


For the following theorems it is presupposed that e, h, i, i:, ia, etc., are 


sentences in £ such that the { }-expressions and c-expressions occurring 
have values. ` 


T3 shows how the relevance quotient makes it possible to express the 


confirmation of a conjunction in terms of the confirmations of the com- 
ponents separately. 


+T66-3. 
a. c(h. i,e) = {hi}. X c(h,e) X c(i,e). (From T59-1n(2), Dra.) 
b. c(h. inje) = {hi j}e X c(he) X clie) X c(j,e). » 
Proof. c(h ai»j,e) = {hi aj}e X clhe) X cl. je) (from (a)). c(i «j,e) = 
{4,7}. X elie) X c(j,e) (from (a)). Hence assertion by D1b. 
C. Ctra tae... adn) = fartr, ... inde X TI cG,,e). 


Proof analogous to (b), by mathematical induction. 
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T4 states the commutativity of the relevance quotient. T4a is simply the 
general division theorem (T60-1d) rewritten in the present notation. 


+T66-4. 
a. {h,i}. = {i,h},. (From T6o-1d.) 
b. finia... tnbe = {ippip - - - s?m_}e, Where the right-hand expression 


contains the same terms as the left-hand expression but in an arbi- 
trary different order. 


Proof. From T3¢, because of the commutativity of multiple conjunction and 
multiplication. 


Note that T4a holds only if both { }-expressions have values. In fy, 
either both have values (this is the case if and only if e.ż and e. h are 
not L-false) or both have not. In x, however, this does not hold. [It may 
occur in £o that c(#,e) = o and hence the first { }-expression has no value; 
but nevertheless e . h is not L-false, c(i,e. 4) has a value, and c(ż,e) > o, 
and hence the second { }-expression has a value.] 

T66-5. {..., iej- -Je X {ijle = {.--,%9,-+-Je, where the last 
{ }-expression is like the first except for containing in the place of the 
term ‘i. 7’ the two terms ‘i’ and ‘7’ separately. (From Drc, T4b.) 

T7 is W. E. Johnson’s “Cumulative Formula”. 


T66-7. Let i be the conjunction tr «ta» - . . sin «tat (% = 1): 
Linin «+ ssi nyse wa XTE c(h ni2) 


[ether x elhe si) = o 
WOK Keni) iain, .sinsslow a X JJ C'e = ia) 
Ps 


Proof. With respect to a variation of h, the following proportionalities hold. 
c(h,e »i) prop. c(h,e) X c(i,e « h) (T60-2b). Hence 
[c(hye)]" X clhe =i) prop. [c(h,e)I"** X ce « h) . (1) 


From T3c: on 
lelh eJ X clie a h) = [h] X fisin- +5 inprjoan X Il clipe sh). (2) 


Pat 


If we apply T6o-2b to each of inta,» - » într instead of i and then equate 
the product of the left sides to the product of the right sides and finally ex- 


change the two sides, we obtain: 


[c(h,e)]"** X Il clipe » h) prop. II c(h,e « ip) - (3) 


Par 


The assertion follows from (1), (2), and (3). 


As Keynes remarks (p. 152, in different terminology), the accumulative 
formula is to be applied in the following situation. X has accumulated the 
items of evidence i, ia, -. - ) 44x in addition to his prior evidence e; he 
desires to know the ratio of the c-values of # and h’ and maybe other 
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hypotheses under consideration, while he knows already the confirmation 
of these hypotheses on the evidence of each of the items i,, i, etc., 
separately, together with e. Besides the confirmations just mentioned, 
viz., c(h,e . ip) and c(h',e . ip) for every p, the knowledge of two other 
sets of values is required: (x) the prior confirmation of the hypotheses 
(i.e., c(h,e) and c(h’,e)), and (2) the relevance quotients {i,,i2, . . . , ings} 
both on e . h and on e.h’. Keynes remarks that the latter two values for 
e.h and e.h’ are not related in any way, even when h’ is L-equivalent 
to ~h. 

I omit some further theorems stated by Keynes. The concept explained 
will hardly be used in the remainder of this book. It has been represented 
in this section in order to call attention to an interesting concept which 
deserves further investigation. It seems useful for problems of accumula- 
tive evidence, especially within a theory which (like Keynes’s, in distinc- 
tion to the theory which will later be based on the function c*) deals only 
with c-functions in general without choosing a specific one. In the follow- 
ing sections we shall study problems of relevance with the help of another 
relevance function to be introduced, which turns out to be useful for our 
purposes. We shall see that this new function is additive in certain re- 
spects; that makes it helpful for finding the relevance of connections like 
i.j and iV j, etc., on the basis of the relevances of i and J. 


§ 67. The Relevance Measure 


The numerical function r(i,h,e) is defined as follows: r(i,h,e) = pi m(e » i » k) X 
m(e) — m(e « k) X m(e.i) (D1). We call it the relevance measure of i (to k 
on e), because (in ly always and in le under a certain condition) r(i,h,e) is 
>0, <0, or o, if and only if is (to kon e) positive, negative, or irrelevant, re- 
spectively (T8, To, Tro). r is commutative, that is, r(i,h,e) = r(h,i,e) (T3). If 
i or h is replaced by its negation, r changes to the opposite value (T5). These 
properties of r correspond to those of the relevance concepts. r turns out to be 
a suitable means for characterizing relevance situations and will therefore be 
used continually in the following sections. The chief advantages of r, its addi- 
tivity in two respects, will be explained later (§§ 72, 73). 


We have earlier seen that the regular m-functions may be regarded as 
measure functions for the ranges, because the m-value for the class-sum 
(union) of two mutually exclusive ranges is the sum of the m-values for 
the two ranges. In terms of sentences, this means that, if i and 7 are L-ex- 
clusive, m Vj) = m(é) + m(j) (T57-1m). Analogously, the c-functions 
may be regarded as measure-functions for the ranges because they fulfil 
an analogous condition as stated in the special addition theorem (Ts9-11). 
In this section we shall introduce a relevance function r which possesses 
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the same additivity with respect to exclusive ranges. This is the reason 
why we call this relevance function, in distinction to that discussed in the 
preceding section, the relevance measure. [r is, however, different from 
ordinary measure functions like length, area, volume, etc., by admitting 
also negative values.] It will turn out further, that this function r, in con- 
tradistinction to m- and c-functions, is simultaneously a measure function 
for the contents of sentences (§ 73). 

It is the purpose of relevance functions in general to represent the 
change in the confirmation of # on e by the addition of a new evidence 7. 
The relevance quotient did this by way of the quotient of the posterior 
confirmation c(h,e.i) and the prior confirmation c(4,e). Another rele- 
vance function is the difference c(,e « i) — c(h,e); let us call it D, for the 
moment, D, is the amount of the increase of the ¢ of # by the addition of 
i to e. It can easily be shown that D, has the additivity mentioned above 
for r. Therefore it could be taken as a relevance measure The disad- 
vantage of D, is that it is not commutative with respect to % and t. If we 
exchange k% and i, we obtain the different function clie. h) — c(i,e); let 
us call it D,. Da measures the increase of the ¢ of i by the addition of / to e. 
It is likewise additive. Each of the two functions D, and D, has the 
property that its value is >o if 7 is positive to k on e and hence also h is 
positive to 7 on e. Positive relevance on a given e is a symmetrical relation 
between 7 and h (T65-6a). Therefore, instead of using two different func- 
tions D, and D, it will be more convenient to represent the mutual rele- 
vance of i and A on e by one function which is commutative with respect 
to i and h, that is, which has the same value for i,h,e as for h,i,e. That is 
the chief reason for introducing the function r instead of the two functions 
D, and D.. t is closely related to both of them; it is proportional to D; in 
one respect (T3b) and proportional to D, in another (T3e). r also charac- 
terizes mutual positive relevance of 7 and h by positive values and nega- 
tive relevance by negative values (T8, To, Tro). It is true that the func- 
tions D, and D, have the advantage that it is easier to understand the 
meaning of their values as characteristics of the knowledge situations to 
which they are applied. It turns out, however, that the function r, al- 
though less intuitive, is a more convenient theoretical tool for the analysis 
of problems of relevance. We shall make extensive use of this function 
throughout the remainder of this chapter. 

We shall find in the next section that r is additive in two respects. 
First, the r-value for a disjunction with L-exclusive components is the 
sum of the values for the components (T68-1). This is especially useful in 
view of the fact that every sentence can be transformed into the disjunc- 
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tion of the 3 in its range; and these 3 are L-exclusive (§ 72). Second, the 

` t-value for a conjunction with L-disjunct components is the sum of the 
values for the components (T68-2); this will later be used for an analysis 
of the relevance of a sentence based on the relevances of its content- 
elements, i.e., its ultimate conjunctive components (§ 73). 


+D67-1. Let £ be any finite or infinite system, m be a regular m-func- 
tion in £, and 2, k, and e be any sentences in £. The relevance measure 
(with respect to m in £) r(é,h,e) =p: mle. i. h) X m(e) — m(e.h) X 
mle .7). 

This function is suggested in an obvious way by one form of our pre- 
vious criteria for relevance and irrelevance (T65-4c(r), d(x), e(1), and f(1)). 
The other form (T6s5-4c(2), d(2), e(2), and f(2)) corresponds to the sub- 
sequent theorem Tı. 

+T67-1. r(i,h,e) = m(ewi«h) X m(e.~i.-h) — mle.i. ~h) X 
m(e.~i.h); hence (in the notation of § 65) = m: X m, — m: X m, 
(From Dı, T65-1.) 


T67-2. The domain of definition of r. 
a. In gy, r has a value for every triple of sentences i,h,e. 
b. In le, r (with respect to a given m) has a value for i,h,e 
(1) if and only if m has values for the four sentences e, ¢.h, e.i, 
and e.i. h (from Dr); 
(2) if and only if m has values for the four sentences e.i. h, 
e.t. ~h, e. ~i. h, and e. ~i. ~h. (These are the sentences 
ky, ka, kz, and k, in the notation of § 65.) (From T1.) 


Thus for the existence of r-values there are no restrictions in a finite 
system and only weak restricting conditions in the infinite system; the 
latter conditions require only that certain m-values exist, not that they are 
positive. For some m-functions, this condition is fulfilled for all sentences. 
[For example, we shall find later that our function m* has a value for 
every sentence in £% (§ 110A).] For the subsequent theorems here and 
throughout this chapter, the following is tacitly presupposed if not other- 
wise indicated. @ is any finite or infinite system; m is any regular m-func- 
tion for Q; c is the regular c-function for & corresponding to m; r is the 
relevance measure with respect to m in l; the concepts of positive and 
negative relevance and irrelevance are meant with respect to c in &. It is 
further presupposed that the sentences occurring as arguments of one of 
the functions m, ¢, or r are such that this function has a value for them. 


+T67-3. Commutativity. r(h,i,e) = r(i,h,e). (From Dı, T57-1a.) 
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The following theorem connects r with the two differences of c-values 
discussed above and with the relevance quotient discussed in the pre- 
ceding section. 


T67-4. 


a. 


i, 


r(i,k,e) = [e(h,e» i) — c(h,e)] X m(e) X m(e.7). 

Proof. 1. Let m(e «i) = o. Then m(e «i. h) = o (T57-1s). Hence both sides 
of the equation are o. 2. Let m(e.t) > o. Then m(e) > o (T57-1s). Then 
c(he wi) = m(e «is h)/m(e si), and c(h,e) = mes h)/m(e) (for Qo, T56-4a). 
Hence the assertion. 


. With respect to a variation of k, r(é,4,e) is proportional to 


c(h,e » i) — c(h,e). (From (a).) 


. If m(e.i) >o (and hence m(e) > o, Ts7-1s), then c(h,e.7)— 


c(h,e) = r(i,h,e)/m(e) X mle . i). (From (a).) 


. (ihe) = [ce . K) — c(i,e)] X m(e) X me. h). 


Proof, by exchanging in (a) 4’ and “h’, and then applying T3. 


. With respect to a variation of 7, r(i,h,e) is proportional to c(i,e . h) — 


c(i,e). (From (d).) 


. If m(e.h) >o (and hence m(e) > o), then c(i,e.h) — c(i,e)= 


x(i,h,e)/m(e) X m(e.h). (From (d).) 


. t(i,hje) = [{hyi}e — 1] X m(e-h) X m(e.7). 


Proof. 1. Let at least one of the two values m(e.h) and m(e.i) be o. Then 
both sides of the equation are o (as in the proof of (a)(z)). 2. Let both m(e « h) 
> o and m(e . i) > o. Then m(e) > o (T57-1s). If we transform the expres- 
sion in square brackets according to D66-1a, and then the c-expressions occur- 
ring as in the proof of (a) (2), we obtain the assertion. 


If both m(e . 4) > o and m(e -i) > o, then 
(hile = 1 + mei Xmen - (From (h).) 


If r(ċ,h,e) is known, then (c) shows how to determine the increase in 
the c of k caused by the addition of 7 to e, and (f) shows how to determine 
the increase in the c of i caused by the addition of h to e. The amounts of 
these two increases are in general different. The value of r, however, is the 


same for both cases (T3). i 
Ts determines the relevance measure for the cases that 7 or k or both 


are negated. 
+T67-5. 


a. 


t(~i,h,e) = —t(i,h,e). 

Proof. t(~i,hje) = mle a ~i » h) X mente ~h) — ines ~i a ~h) X 
m(e aia h) (Tr), = mhe sia ~h) X mle. ~i. hk) — mle «i h) X 
me. ~i. ~h) = —rli,h,e) (Tı) 
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b. r(t, ~h,e) = —r(i,h,e). (From T3, (a).) 
«c. r(~i, ~h,e) = r(i,h,e). (From (a), (b).) 


T5 says that r changes its sign if either 7 is replaced by its negation 
(a) or h is replaced by its negation (b). Hence r remains unchanged if 
both replacements are made simultaneously (c). Since, as we shall soon 
see, a positive value of ris characteristic of positive relevance and a nega- 
tive value of negative relevance, these results are in accord with earlier 
theorems on positive and negative relevance (T6s5-6h). However, the 
present theorems are stronger inasmuch as they not only assert the change 
from positive to negative relevance but specify the numerical value of the 
relevance measure. The fact that this value does not change its absolute 
amount but simply its sign, both for the change of 7 and for that of k, 
makes the function r appear as an especially simple means for charac- 
terizing relevance situations. 

The following theorem T6 says that in certain extreme cases r becomes 
o. This holds in particular if e is L-false (b) or e L-implies either / (c) or 
~h (a) or ż (e) or ~i (d). [In 20, the theorem applies also to nontrivial 
cases, where one of the sentences e, e. h, e. ~h, e.i, and e. ~i is al- 
most L-false.] 


T67-6, 

a. Let m(e.h) = o. (This is in particular the case if pe D ~h, in 
other words, e.h is L-false, e and # are L-exclusive.) Then, for 
every 7, r(i,h,e) = o. 

Proof. m(e « h „ i) = o (Ts7-1s). Hence the assertion by Dr. 


b. Let m(e) = o. (This is the case if e is L-false.) Then for every i and 
h, v(i,h,e) = o. (From T57-1s, (a).) 
c. Let m(e. ~h) = o. (This is the case if pe D h.) Then, for every 
i, r(i,h,e) = o. 
Proof, from (a) with ‘~h’ for ‘h’, and Tsb. 


d. Let m(e.i) = o. (This is the case if }e D ~i, in other words, e.i 
is L-false, e and 7 are L-exclusive; and in particular if 7 is L-false.) 
Then, for every h, r(i,h,e) = o. (Analogous to (a).) 

e. Let m(e. ~i) = o. (This is the case if } e D i; and in particular 
if 7 is Ltrue.) Then, for every k, r(i,h,e) = o. (From (d), Tsa.) 


The following theorems T8 to Tro show that r fulfils its purpose of 
serving as a relevance function; under certain conditions, positive rele- 
vance is characterized by a positive value of r, negative relevance by a 
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negative value, and irrelevance by the value o. There are certain excep- 
tions in 2@, which will soon be explained. 


+T67-8. 
a. If r(i,h,e) > o, then 7 is positive to k on e. 
Proof. m has values for e, e « h, e «i, and € «ih (T2b(r)). m(e) and m(e 7) 
are >o (T6b, d). Hence e(h,e «i) and c(h,e) exist (T56-4a), and their differ- 
ence is >o (T4c). Hence the assertion by D65-1a. 


b. If r(i,h,e) < o, then ż is negative to k on e. (Analogous to (a).) 
c. If r(é,h,e) = o, then 7 is relevant to k on e. (From (a), (b).) 
d. If 7 is irrelevant to % on e, then r(i,h,e) = o. (From (c).) 


The statements in Tọ are restricted converses to those in T8. 


+T67-9. Let e. i not be almost L-false; in other words, either e.t is 
L-false or m(e.i) > o. Then the following holds. : 
a. If zis positive to 4 on e, then r(i,h,e) > o. 


Proof. e.i is not L-false (T65-3); hence m(e. i) > o; hence m(e) > o 
(157-18). c(h,e s i) — c(h,e) > o (D65-1a). Hence the assertion by T4a. 


b. If i is negative to k on e, then r(i,/,e) < o. (Analogous to (a).) 
c. If i is relevant to %4 on e, then r(i,h,e) # o. (From (a), (b).) 
d. If r(i,t,e) = o, then 7 is irrelevant to / on e. 


Proof. Let e.i not be L-false (otherwise the assertion follows from D65- 
1d(2)). Then m(e.i) > o, and hence m(¢) > o. Therefore c(h,e. i) = 
m(e ai. h)/m(e i), and c(h,e) = m(e« h)/m(e) (T56-4a). (This, the existence 
of the two c-values, is the decisive point in this proof; because of this point, we 
cannot derive (d) directly from (c).] The difference of the two c-values is o 
$ (T4c). Hence the assertion by D65-1d(1). 

i The restricting condition in To requiring that esi not be almost L-false 
} applies, of course, only to 2a. It must be required for 2a because here the 
f following can happen: c(h,e « i) and c(%,e) have values, hence e «7 is not L-false; 
H the first of the two values is greater, hence 7 is positive to k on e; however, 
j m(e.i)=0 (e.i is almost L-false) and hence m(e.i»h) = 0; therefore 

r(i,t,e) = o. As an example, take the sentences e, h, and i mentioned in the 
f discussions following T65-6 and T65-7. Then the following holds in læ for cer- 
| tain m and ¢ (among them our functions m* and c*):i and h and hence e «i 

and e . kare almost L-false; c(h,e) = 0, c(h =i) = 1, hence zis positive to k on 
} e; on the other hand, m(i) = o, hence m(e si) = m(e.i«h) = 0, hence 


t(i,h,e) = o. 
Tro says that the parallelism between the relevance concepts and r 
holds in €y without any restriction. The same holds for t=, if e and 7 are 
nongeneral, since in this case ¢ «7 cannot be almost L-false (T58-3e). 


+T67-10. For fy. 
{ a. The following four conditions are logically equivalent, that is to say, 
| if any of them holds, all others hold too: 
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(1) r(,h,e) > 0; 

(2) zis positive to h on e; 
(3) t(h,i,e) > 0; 

(4) his positive to z on e. 
(From T8a, Toa; T3.) 


b. The following four conditions are logically equivalent: 


(1) t(é,h,e) < 0; 

(2) iis negative to k on e; 

(3) t(h,i,e) < o; 
(4) h is negative to z on e. 

(From T8b, Tob; T3.) 


c. The following four conditions are logically equivalent: 


(1) t,he) ¥ 0; 

(2) iis relevant to % on e; 
(3) r(h,i,e) = 0; 

(4) k is relevant to 7 on e. 
(From T8c, Toc; T3.) 


_d. The following four conditions are logically equivalent: 


(1) t(i,h,e) = 0; 

(2) iis irrelevant to k on e; 
(3) t(h,i,e) = 0; 

(4) his irrelevant to i on e. 
(From T8d, Tod; T3.) 


Tro will be used frequently in the following sections, often without ex- 
plicit reference. 


§ 68. Relevance Measures for Two Observations and Their Connections 


The problems discussed in this section and the next one concern the rela- 
tions between the relevance measures r of i and of j (to k on e), on the one 
hand, and those of certain connections of i and j, especially i Vj and i «j, on 
the other. Two theorems of additivity are found: (r) if i andj are L-exclusive 
with respect to e, then the r-value for i V j (to h on e) is the sum of the values 
for i and for j (Tıb); (2) if i and j are L-disjunct with respect to e, then the 
t-value for ż «7 (to 4 on e) is the sum of the values for i and for j (T2b). The 
first result is analogous to the special addition theorems for m and for c; the 
second result marks an essential difference between r and those two func- 
tions. The twofold additivity of r is important for our further analysis of 
televance. 

If the r-values (to # on e) for ij, i. ~j, ~i.j, and ~i. ~j are given, 
the r-values for i, j, and i Vj can be obtained as sums of one, two, or three of 
those four t-values. This is done first generally (T3) and then for all possible 
cases of deductive relations between i and j on the evidence e (T4). 
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In this section we shall develop theorems which tell us whether and how 
the relevance measures of two sentences 7 and 7 (to # on e) determine the 
relevance measures of their connections, especially of  .j and ż V j. These 
theorems will enable us to deal with problems like this: suppose we know 
that i is positive to + on e (or negative, or irrelevant) and further that 7 is 
positive to # on e (or negative, or irrelevant), what can we infer concern- 
ing the relevance of å . j, or of i V j? Problems of this kind occur frequently 
in science and in everyday life. For example, some scientists are jointly 
interested in a certain hypothesis 4, which may be a general theory or a 
singular prediction; they pool their prior evidence e; then they start sepa- 
rately to look for further observational material relevant to 4. Suppose 
now one of these scientists reports to the group that he has made ob- 
servations (i) which are positive to % (on e); then another one reports 
that he has found evidence (j) which is positive (or negative or irrelevant) 
to k. Thereupon the group wishes to know what is the relevance to h of 
the two reports 7 and j taken together, that is, of the conjunction 7. j. 
In other situations, the problem concerns the relevance of the disjunc- 
tion iV j. Suppose, for example, X is interested in a hypothesis concern- 
ing the influence of certain rarely occurring conditions on pneumonia. 
Since he knows so far of only a small number of relevant cases, he is in- 
terested in every new report on a case where those particular conditions 
occurred, even if the report is not as specific as he might wish. Now he 
receives a new report: the particular conditions did occur, and many other 
details are reported; however, it has not been examined whether it was a 
case of virus pneumonia or of bacillus pneumonia. After careful delibera- 
tion, X comes to the conclusion that, if the report had stated the first, it 
would have positive relevance to his hypothesis; if the second, it would be 
positive too (or negative, or irrelevant). And now he wishes to answer 
the question, what is the relevance of the actual report from which he 
learns merely that either the one or the other was the case? 

There are analogous problems concerning two hypotheses. If we know 
the relevance of i to 4 on e, and also that of 7 to k on e, what is the rele- 
vance of i to k. k or to h V k? These problems will be discussed in a later 
section (§ 70). 

Even in a complete system of inductive logic, that is to say, a theory 
based upon a specific c-function, it is useful to have general theorems say- 
ing under what conditions the c increases or decreases, because these 
theorems might often save us the trouble of computing the prior and 
posterior confirmations and thus determining the increase or decrease. 
The importance of general theorems of this kind: is still greater in the 
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theory dealt with in this and the preceding chapters, the general theory of 
regular c-functions. As long as we do not go beyond this theory and choose 
a specific c-function as our concept of degree of confirmation, we cannot 
determine any c-values (except the extreme values o and 1), and thus 
there is no other way of determining positive or negative relevance than 
with the help of general theorems. It is true that we shall later take the 
further step to a complete inductive logic. However, we cannot expect to 
find general agreement with our specific choice of a function, while there 
is a practically general agreement with respect to those assumptions 
which underlie the theory of regular c-functions (see § 62). Therefore it is 
important to establish as many results as possible on this basis common 
to all theories. Thus, for example, it would be of great interest to discover 
whether the following statement holds generally: if 7 is positive to / on e 
and ĵ is irrelevant, then 7 V j is positive; or, if this does not hold generally, 
whether at least the weaker statement holds that under those conditions 
i V7 is not negative to k on e. 

One might perhaps think that questions of this kind could be answered 
without elaborate technical analyses; that, for example, statements of the 
following kind could be established simply by common sense: (i) if dis posi- 
tive to h on e and f is negative, then for 7.7 three cases are possible de- 
pending upon the particular nature of the sentences involved: i.7 may 
be positive (if 7 has, so to speak, a stronger influence than /), or negative 
(in the inverse case), or it may be irrelevant (if i and 7 cancel out each 
other); (ii) if both 7 and 7 are positive to % on e, then i. j is positive too. 
To this it should be remarked first that it is always at least of theoretical 
interest to reconstruct plausible and even indubitable relationships within 
a systematic theory, for example, to prove in the propositional calculus 
that from ż . 7 we can derive 7. 7. Furthermore, the superficial appearance 
of plausibility in inductive logic is very often entirely misleading. Thus, 
for example, of the two statements mentioned (i) is right, but (ii) is 
wrong; the conjunction of two sentences which are positive to on e may 
indeed be positive too, but it may also be irrelevant and even negative. In 
other words, it is possible that each of two reports 7 and j, if added to our 
prior evidence, increases the probability of a certain future event, and 
nevertheless the simultaneous addition of both reports makes the event 
less probable. This is the first of four possible cases listed below which 
may appear rather surprising at first glance. The possibilities are meant 
in this way: for any c-function that comes at all into consideration as an 
explicatum, we can easily find sentences e, k, i, and j which exhibit these 
relationships. 
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The following cases are possible: 


1. Both i and j are positive to # on e, but 7.7 is nevertheless negative 
to h on e. 3 

2. Both i and j are positive to h on e, but i V7 is negative. 

3. iis positive both to # and to & on e, but it is negative to h. k. 

4. iis positive both to 4 and to k on e, but it is negative to / Vk. 


These possibilities will not only be proved technically but also made 
plausible, intuitively understandable, with the help of simple examples. 
The cases (3) and (4) will be dealt with in the later discussion concerning 
connections of two hypotheses (§ 71). 

In the following discussions we shall often make use of the relative 
L-terms ‘with respect to (the given evidence) e’ (see D20-2). 

In the following two theorems, the parts Trb and T2b are of especial 
importance. They state the additivity of r for disjunctions and conjunc- 
tions under certain conditions and are fundamental for a great part of our 
further discussions in this chapter. 


T68-1. Additivity for disjunctions. 
a. (i V j,h,e) = (ihe) + t,he) — tC. 3,2). 
Proof. xi V jme) = mle «hk» G VJ) X mle) — mle « h) X mle « (G Va) 
(D67-1), 
= wee hei) V (esha j)] X mle) — mle h) X mile. i) V (e-3)] (T21-5m(1)), 
= [mhes h.i) + mles haj) — m(eehaisj)] X me) — mle. h) X [m(e . 7) 


+ m(e.j) — m(e ai j)) (T5728), 

= rae A ANO ns m(e.h) X mle ai)] + [mle = h j) X m(e) — m(e . h) 
X m(e .j)] — [mle « h ai =j) X me) — m(esh) X mle ai = j)]. 

Hence the assertion by D67-1. 


+b. Let m(e.i.j) = o. (This is the case in particular if e.i.j is L- 
false, in other words, ¢ andj are L-exclusive with respect to e.) Then 
t(i V jhe) = r(i,h,e) + r,h,e)- 
Proof. t(i . j,h,e) = o (T67-6d). Hence the assertion by (a). 


c. Let i be a disjunction with n (2 2) components: EN GN eN a 
For any two distinct components tm and ip, let m(e u in = tp) = 0. 
` (This is the case if i, - - - , % are L-exclusive in pairs with respect 


to e.) Then r(i,h,e) = X t(i»,he). 


Proof. Let i’ be i: V is V... Vin. Then +i’ «in is L-equivalent to 
(ew iraia) V (Caizein) V.-- V (e » in~1 » în) (T21-5m(2)). For any component 
in the latter disjunction and hence (T57-1s) also for any conjunction of two 
or more of them, m = o. Hence m(e i! » in) = 0 (T57-1V)- Therefore, since i is 
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i’ Vin, t,he) = r(i’,h,e) + Tinh ,é) (b). If the assertion of the theorem holds 
for n — 1, then r(i’,h,e) = Lins e); hence, with the result just found, the 


assertion for n follows. The nin, holds for n = 2 (b). Therefore, by mathe- 
matical induction, it holds for every n = 2. 


One of the characteristics of those numerical functions of classes which 
are called measure functions is the following property: for any such func- 
tion, its value for the class-sum of two classes is always the sum of its 
values for the two classes minus its value for the class-product; it is clear 
that the latter value must be subtracted, because by summing the meas- 
ures of the two classes their common part has been counted twice. This 
is the reason for the general addition theorem concerning m (T57-1k); this 
theorem was the basis for the general addition theorem concerning c 
(T59-1k) and is here the basis for the general addition theorem concern- 
ing r (Tra). We shall see later that the necessity of the subtraction of the 
last term in Tra brings about the possibility of the case (2) mentioned 
earlier: even if both 7 and 7 are positive to k on e, i Vj may be negative. 
We see from Tra that this would happen if the r-values for i, j, and i.j 
are positive and the last one greater than the sum of the first and second. 
The question whether and under what conditions this can occur will be 
examined later. 

In the case of mutually exclusive classes, the measure functions have 
simple additivity: the value for the class-sum is the sum of the values of 
the two classes; this is the chief characteristic of measure functions. 
Therefore we had, under the condition of L-exclusivity of the sentences, 
which corresponds to the exclusivity of the ranges, the special addition 
theorem for m. (T57-1m). Based on it are the special addition theorem 
for c (T59-11) and now that for r (Trb). In addition to these theorems for 
simple disjunction, we have the special addition theorems for multiple 
disjunction concerning m (T57-rv), ¢ (Ts9-1m), and r (Tro). 


T68-2. Additivity for conjunctions. 
a. c(i. j,k,e) = r(i,h,e) + r(j,h,e) — r(é V j,h,e). (From Tra.) 
+b. Let m(e. ~i. ~j) = o. (This is the case in particular if } e D i iVij, 
in other words, if i and j are L-disjunct with respect to e.) Then 
t(i .j,h,e) = r(i,h,e) + r(j,h,e). 
Proof. m(e « ~(i Vj)) = o. Therefore r(i V j,h,e) = o (T67-6e). Hence the 
assertion by (a). 
c. Let 7 be a conjunction with n (= 2) components: i.» iz... -= tn 
For any two distinct components in and ip, let m(e « ~in « ~ip) = 0: 
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(This is the case if ôm and 7, are L-disjunct with respect to e, i.e., 


He D in V ip.) Then t(i,h,e) = >> r(é,,h,¢). 


Proof, from Tıc by substituting ‘~7’ for ‘i’ and ‘~i,’ (for every p = 1 
to n) for ‘ip’, and T67-5a. 

T2a is still analogous to theorems on m (T57-11) and c (T59-1q). How- 
ever, the analogy does not go farther. For r we have here a special addition 
theorem for conjunction (T2b), while the corresponding theorem for 
m (T57-1t) has not this simple form; the reason for this difference is that 
if i and j are L-disjunct, i Vj is L-true, and hence r(i V j,h,e) = o while 
m(ż V j) is not o but 1. The corresponding theorem for ¢ (T59-1r) is analo- 
gous to that for m, hence likewise not as simple as that for r. Thus the re- 
sult is that, under suitable conditions, r is additive with respect to dis- 
junctions, like m and c, but further also additive with respect to conjunc- 
tions, in distinction to m and c. 

Suppose two sentences 7 and j are given. Let us consider the four sen- 
tences i.j, in ~j, ~i.j, and ~i. ~j. They correspond to the four 
lines of the truth-table for 7 and 7 (see the explanation preceding T21-7). 
Let Z be any non-L-false sentence constructed out of 7 and j with the 
help of any connectives. Then / can be transformed into a disjunction of 
n of the four sentences (1 < n < 4) (T21-7d); for example, 7 Vj is L- 
equivalent to the disjunction of the first three of them. Since the four sen- 
tences are L-exclusive in pairs (T21-7a), r(},k,e) can be represented as a 
sum of some of the r-values for the four sentences. Thus these four r- 
values are a convenient basis for studying the relations between the 
t-values of 2, j, and their connections. This method will be applied in T3. 

We shall use in this section and the next one the following abbrevia- 
tions for the four r-values: 

t, = 1 . j,h,e), 

ta = t(t. ~7,h,e), 
t, = r(~i . j,h,e), 
t, = r(~i . ~j,h,e). 

T68-3. Let e, k, i, j be sentences in £. 

a. r: + ra + r; +1, = o. 

Proof. Let k be (i j) V G- ~j) V (~i. j) V (~i . ~j). The components of 


this disjunction are L-exclusive in pairs (T21-7a), hence also L-exclusive in 


pairs with respect to e. Therefore (Tıc) r(k,h,e) is the sum of the four r-values 
as stated above. On the other hand, & is L-true (T21-7b), hence fe D k. 


Therefore r(k,h,e) = o (T67-6e). 
b. r(é,h,e) = t: + te. 
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Proof. i is L-equivalent to (i «j) V (i « ~j) (T21-sj(2)). Hence the asser- 
tion by Trb. 


c. r(7,4,e) = r: + r. (Analogous to (b).) 
d. (1) c(i V j,h,e) = t: + ta + t; 
(2) = =r, 
Proof. 1. i \ j is L-equivalent to (i « j) V (i. ~j) V (~i = j) (T21-7d). Hence 
the assertion by Tıc. 2. From (1) and (a). 


Tf any deductive relations (L-concepts) hold between i and j, then one 
or two or three of the four sentences i. j, i. ~j, ~i.j, and ~i. ~j are 
L-false (for example, if }i D ~j,i.7 is L-false; if }i = j, then i. ~j 
and ~i „j are L-false); then for these L-false sentences r = o (T'67-6d). 
This provides a simple method for studying how deductive relations be- 
tween 7 and j, and also e, affect the r-values of i, j, and their connections. 
This method will be applied in T4. There we shall use not the strong 
deductive condition that a certain sentence is L-false but rather the con- 
dition that its m-value is o; the latter condition is weaker in Qo. 


T68-4. 

a. Let m(e.i.j)=0. (This is the case in particular if ¢.i.j is 
L-false, in other words, e.ż D ~j, i andj are L-exclusive with re- 
spect to e.) Then the following holds. 

(1) t: = o. (From T67-6d.) 

(2) ta +t; + t, = o. (From T3a, (1).) 

(3) t(i,h,e) = ra. (From T3b, (r).) 

(4) tG,h,e) = x5. (From T3c, (z).) 

(5) t@Vj,h,e) = ta +t, = —r,. (From T3d, (1).) 

b. Let m(e.i. ~j) = o. (This is the case if e.i. ~j is L-false, hence 
best D4) 

(1) t2 = o. (From T67-6d.) 

(2) t: + t; + t, = o. (From T3a, (z)). 

(3) r(i,k,e) = t.. (From T3b, (1).) 

(4) tG,he) = c(i V jhe) = t: +1, = —r,. (From T3c, T3d, (1).) 

c. Let m(e. ~i. j) = o. (This is the case if e. ~i -j is L-false, hence 
Fej Dz.) 

U) t; = o. (From T67-6d.) 
(2) T: +t, + t, = o. (From T3a, (1).) 
(3) t(j,4,e)/= tı. (From T3c, (1).) 
(4) v@,h,e) = c(i V jhe) = t: + ta = —r,. (From T3b, T3d, (1).) 

d. Let m(e. ~i. ~j) = o. (This is the case if e. ~i. ~j is L-false, 

in other words, |e D 4 V j, i andj are L-disjunct with respect to e.) 


oO 


mh 


n 


me 


t. 


. Let m(e. ~i. j) = mle. ~i. ~j) 
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(1) r, = o. (From T67-6d.) 

(2) t: + ta +1; = o. (From T3a, (1).) 

(3) r(i,h,e) = ts + ta = —t;. (From T3b, (2).) 
(4) t(j,h,e) = r: +t; = — ta: (From T3c, (2).) 
(5) GV jie) = o. (From T3d(1), (2).) 


. Let m(e.i.j) = m(e.i. ~j) = o; in other words, m(e. i) = o. 


(This is the case if e«7 is L-false, hence | e D ~i.) 

(1) t: = t, = o. (From (a)(1), (6)(z.)) 

(2) t; +t, = o. (From T3a, (1).) 

(3) r(i,h,e) = o. (From (a)(3), (1).) 

(4) v(j,h,e) = t(i V jike) = t; = — 1, (From (b)(4), (1)-) 


. Let m(e.i.j) = m(e. ~i.j) = 0; in other words, m(e.j) = 0. 


(This is the case if ¢.7 is L-false, hence | e 9 ~j.) 

(1) t: = tj = o. (From (a)(1), (c)(1).) 

(2) ta + r, = o. (From T3a, (1).) 

(3) t,he) = o. (From (c)(3), @)-) 

(4) r(i,h,e) = r(å V j,h,e) = ta = —ty. (From (c)(4), (1).) 

Let m(e . i. j) = m(e. ~i. ~j) = 9; in other words, m(e. (i =j)) 
= o. (This is the case if e. (¢ = j) is Lfalse, hence pe D (i = ~4).) 
(1) t: = t, = o. (From (a)(z), (d)(1).) 

(2) ta + r, = o. (From T3a, (1).) , 

(3) r(é,h,e) = ta = —ty. (From (d)(3), (1).) 

(4) t(j,he) = ts = —ta = —t(i,he). (From (d) (4), (1).) 

(5) c(i V j,h,e) = o. (From (d)(5).) 

Let m(e.i. ~j) = mle. ~i.j) = 0; in other words, m(e. 
~(i = ĵj)) = o. (This is the case if e. ~(i = j) is L-false, hence 
be D (i = j), i andj are L-equivalent with respect to e.) 

(1) ta = r, = o. (From (b) (1), (c)(2).) 

(2) t: +1, = o. (From T3a, (1).) 

G) tme) = tjhe) = tGV jhe) = t= =t (From (b)(3), 


(b)(4), (1).) 


. Let m(e. i. ~j) = mle. ~i. ~j) = 0; in other words, mle. ~j) = 


o. (This is the case if e . ~j is L-false, hence łe D j.) 

(x) ta = r, = o. (From (b)(1), (d)(2).) 

(2) t: + r; = o. (From T3a, (1).) 

(3) v(é,hje) = t: = —ty. (From (b)(3), (2).) 

(4) t(j,h,e) = r@Vj,h,e) = ©. (From (b)(4), (2).) 

= o; in other words, m(e. ~i) 
= o. (This is the case if e. ~t is L-false, hence | e D 7.) 

(x) ty = t, = o. (From (c)(1), (4)(x).) 


374 VI. RELEVANCE AND IRRELEVANCE 


(2) t: + t, = o. (From T3a, (1).) 
(3) t(é,h,e) = r(ê V j,h,e) = o. (From (c)(4), (2).) 
(4) tG,4,e) = t: = —ra. (From (c)(3), (2).) 

k. Let any three of the four sentences i «j,i. ~j, ~i «j, and ~i. ~j 
be selected. Let m = o for the three conjunctions of e with each of 
the selected sentences. (This is the case if these three conjunctions 
are L-false, and hence e L-implies the one sentence among the four 
which has not been selected.) 

@) r= n= nE n= o: 
Proof. For each of the three selected sentences, r = o (T67-6d). Therefore 
the same holds for the one remaining sentence (T3a). 


(2) For i, j, i Vj, and i.j (to k on e), r = o. (From T3b, c, d, 
and (z).) 
i. Let m(e.2.j) = m(e.i. ~j) = m(e. ~i. j) = m(e. ~i. ~j) = 0; 
in other words, m(e) = o. (This is the case if e is L-false.) 
@) n = n= r= 1% = o. 
(2) For i, j; i V j, and i. j (to k on e), r = o. (From (k); or directly 
from T67-6b.) 


T4 has dealt with all those cases where one, two, three, or all four of 
the sentences e.i.j, e.i. ~j, e.~i.j, and e. ~i. ~j have the 
m-value o, which includes the cases where these sentences are L-false. 
Thereby all possibilities of deductive relations (L-concepts) between i and 
J on the basis e are dealt with, in other words, all nonquantitative relations 
(like inclusion, exclusion, emptiness, etc.) between those parts of R; and 
NR; which are within R.. The purpose of T3 and T4 is this. If the relevance 
measures of the four L-exclusive sentences i. j, t1~j, ~i.j, and 
~i . ~j are given, then the theorems determine the relevance measures 
for i, j, andi V j, and enable us to find easily those for any other connec- 
tions of ¢ and 7. T3 does this in general, and T4 for all cases of deductive 
relations. 


§ 69. The Possible Relevance Situations for Two Observations and Their 
Connections 


Suppose that the relevance situation for four sentences e, k, i, and j is de- 
scribed in the following way: for each of the sentences i sj, ia ~j, ~iaj, 
~i. ~j, i,j, and i Vj, not the numerical value of r (to k on e) is given, but 
merely the sign of r, that is to say, it is stated whether r > 0, <o, or = o. 
(These indications for the seven sentences are, of course, not independent of 
each other.) A table (Tra) is given which contains a complete list of all seventy- 
five possible relevance situations thus described in terms of signs of r. With the 
help of this table, general theorems about possible relevance situations are 


i 
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i 
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derived, first in terms of signs of r (T2), and then in terms of the relevance 
concepts (T3). Four kinds of relevance situations, whose possibility seems sur- 
prising at first glance, are studied more in detail; among them are the follow- 
ing: (xa) ¢ and j are both positive (to # on e) but 7 «j is negative; (2a) i andj 
are both positive but i Vj is negative. These situations are illustrated and 
made plausible by examples with numerical values. Finally, the following is 
shown by a general theorem (T5) and by examples: if it is known that j is L- 
implied either by 7 alone or by e «i, then from the relevance of i (to k on e) 
nothing can be inferred concerning the relevance of j, or vice versa. 


The problems which we have discussed in the preceding section and 
shall further discuss here concern the following situation. The prior evi- 
dence e is given; the hypothesis / is considered; 7 and j are pieces of addi- 
tional evidence, for example, reports of new observations. The questions 
to be answered concern the relevance to / on e of and j and their con- 
nections. We had two theorems (T68-3 and T68-4) which state the rela- 
tions between the relevance measures of the four sentences 7.7, 7. ~j, 
~i.j, and ~i. ~j, on the one hand, and those of 7, j, and 7 Vj, on the 
other. We shall now turn to nonnumerical questions of relevance; that is 
to say, we ask for each sentence / among those mentioned, not what is 
its relevance measure to / on e, but merely whether / is positive, negative, 
or irrelevant to / on e. Or, more exactly, we ask not about the numerical 
value of r for Z (to h on e) but merely for what we shall call the sign of r, 
that is, whether r for Lis >0, <o, or o. These three cases correspond in 
general to positive relevance, negative relevance, and irrelevance. [How- 
ever, as we have seen earlier, this correspondence holds without restric- 
tions only in £y and for nongeneral sentences in lœ. If e. } is an almost 
L-false sentence in {o (and hence contains a variable, T58-3e), then 
m(e.l) = o and hence m(e./.h) = o and hence r(},h,e) = o (D67-1); 
nevertheless, / is in this case not necessarily irrelevant to % on e but may 
be positive or negative.] ‘ 

We shall now consider the possible relevance situations to / on e for 
the sentences mentioned above (viz., i=j, i. ~j, ~t«J, ~i. ~j; i j, 
and i V 7), characterized in a nonnumerical way by the signs of r for these 
sentences. With the help of the theorems in the preceding section it will 
now be possible to state a complete list of all possible relevance situations in 
this sense for the sentences mentioned; their number is 75. This list will 
be given in the subsequent table Tra. The table contains, aside from the 
enumeration at the left-hand side, seven columns for the seven sentences 
mentioned above. (At present, we pay attention only to those headings 
of the columns which are given on the line a; the line b refers to another 
interpretation Trb of the same table, to be discussed later (§ 71).) The 
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table is constructed in such a manner that every case listed is possible 
and that no possible case is omitted. It will later provide the basis for 
general theorems on possible relevance situations in terms of relevance 
concepts (T3). 


The procedure for constructing the table Tra is as follows. We begin by filling 
the columns (1) to (4) only. ‘+’, ‘—’, and ‘o’ mean that r > o, r < o, and 
t = 0, respectively, for the sentence indicated at the head of the column, al- 
ways to h on e. We list all those distributions of ‘+’, ‘—’, and ‘o’ among the 
four sentences which are possible; these distributions are those which satisfy 
the following rule: 


Rr. If ‘+’ occurs in one of the columns (r) to (4), then ‘—’ must occur in 
another of these columns; if ‘—’ occurs in one, ‘+’ must occur in 
another. 


This follows from T68-3a. t1, ra, ts, and r, in this theorem are the r-values for 
the four sentences to which the columns refer. T68-3a says that the sum of 
these four values is 0; therefore, if one value is >o, another is <o, and vice 
versa. Therefore, there are fourteen distributions without ‘o’ (Nos. 1 to 14); 
because there are all together sixteen distributions of two values among four 
items (T40-31e), and two of them are here excluded by Rr, viz.,“++++’ 
and ‘————’. Now we come to the cases where r = o for just one of the four 
sentences. If r = o for (1) only, we have six cases for (2), (3), and (4) (Nos. 
15-20), because there are eight distributions of ‘+’ and ‘—’ (T4o0-31e) and 
again two of them are excluded by Rr, viz., ‘+++’ and ‘———’. There are 
likewise six cases each if r = o for (2) only, or for (3) only, or for (4) only 
(Nos. 21 to 38). Then we have the cases where r = o for two sentences (Nos. 
39-50); here, according to Rx, one of the two remaining sentences must have 
‘+’ and the other ‘—’, That exactly three sentences have ‘o’ is excluded by Rr. 
Hence there remains only the last case (No. 51), where all four sentences 
have ‘o’. 

Now we turn to column (5) for i. Here we have to determine the sign of r 
for i on the basis of the earlier columns. T68-3b says that the r-value for i is the 
sum of the t-values for the sentences (1) and (2). Thus we have the following 
rules for filling in column (5) (Rs will presently be explained): 


R2. If in columns (1) and (2) we find ‘++’ or ‘+o’ or ‘o+’, we write in 
column (5) ‘+’. 
R3. If we find ‘— —’ or ‘—o’ or ‘o—’, we write ‘—’. 
R4. If we find ‘oo’, we write ‘o’. 
R5. Suppose we find in columns (1) and (2) one ‘+’ and one ‘—’. 
a. If there is still another ‘+’ (in column (3) or (4)) but no other ‘—’, 
we write in column (5) ‘—’. 
b. If there is still another ‘—’ but no other ‘+’, we write ‘+’. 
c. If we find ‘o’ in both (3) and (4), we write ʻo’ in (5). 
d. If we find in (3) and (4) one ‘+’ and one ‘—’, then there are for (5) 
at aree possibilities, ‘+’, ‘“—’, and ‘o’. This occurs in Nos. 5, 6, 9, 
and ro. 


R2, R3, and R4 are obvious, in view of T68-3b. Rsa applies if there is one 
‘—’ but two or three ‘+’ (and hence one or no ‘o’). Consider the case where 
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tı is negative, say, —f:, ťa and r, are positive, say, 7+ and r, and r; is either 
positive, say, 74, or o. Then (T68-3a) 7: = 72+7; or r2+73+74 hence 
rı > ra; therefore r for 4, which is r: + ra (T68-3b) = —r: + 72, is negative. 
Rsb is analogous to Rsa. In the case of Rsc, r: + ra = o (T68-3a). In the case 
of Rsd, we see easily that r for 7 may be >o or <o or = o. As an example, 
take case No. 5. If the r-values for the sentences (1) to (4) are, say, 2r, —1, 7, 
and —2r, respectively, where r is any positive real number, then r for 7, which 
is the sum of the first two values, isr, hence > 0; if the four values arer, — 2r, 2r, 
—r, then r for i is —r, hence <o; if those values are r, —r, 2r, —2r, then t 
for 7 is o. 

Now we come to column (6) for j. According to T68-3c, the r-value for 7 is 
the sum of the values for the sentences (1) and (3). Thus the procedure is here 
analogous to that for column (5). Here likewise, we have sometimes all three 
possibilities: r for 7 may be >o, <o, or o; this occurs in cases Nos. 3, 6, 9, 
and 12. In two of these cases, Nos. 6 and 9, we had already three possibilities 
for i in (5). In these cases the three possibilities for in (6) are independent 
of the three possibilities for i in (5); that is to say, all nine combinations are 
possible. This is shown for the nine combinations A to I in No. 6 by the follow- 
ing nine examples; they are to be understood in the same manner as the above 
examples for the application of Rsd to No. 5; examples for No. 9 can easily be 
constructed analogously. 


(x) (2) (3) (4) (5) (6) 

+ = ve + =(1)+(2) =(1)+(3) 
6A 3r —ar —2r r r r 
B 2r -r —3r ar r -r 
Cc 2r -r =2r f r o 
D ar. —3r =r. 2r -=r r 
E r —2r =r 3r } -=r =r 
F r er -r 2r zr o 
G 2r —2r E r o r 
H AERO Aag zar 2r o =S= 
I r =p. >r r o o 


Finally we fill in column (7) for ¢ Vj according to the following rule: 
Ró. If in column (4) we find ‘+’, ‘—’, or ‘o, then we write in column (7) 
«—?, ‘+, or ‘o’, respectively. 


This follows from the fact that r for i Vj is —r, (T68-3d(2)). R6 determines 
every item in column (7) uniquely. 


T69-1. The possible relevance situations. 


a. Signs of r for i, j, and their connections, to h on e. 
b. Signs of r for å to k, k, and their connections, on e. 
‘4? ‘—’, and ‘o’ indicate that r > 0, t <0, and r = o, respective- 


ly. 
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a) (2) (3) (4) (s) (6) 
Tra. isj bani | ~iaj | ~iaj iia j Wi 
Tıb. hak hawk | ~hak | ~ha ~k h k AVR 
33 + a T o + R © 
34 te = RE o x T o 
35 ate = = o ae + o 
36 T E a o T E o 
37 = + 2 o =F = o 
38 = = + o = + o 
39 o o + G o F + 
40 o o - sr o = z= 
41 o + ° - + ° + 
42 o =i o = - o = 
43 ° Ae - o Ay = o 
44 o a in o - + o 
45 + o o — $: T T 
46 - o o + 7 re va 
47 + ° = ° + ° o 
48 - ° 1B o = o o 
49 + -= o o o ji o 
50 — + o o o - o 
5I o o o o o o 


We can read from the table Tra which combinations of signs of r are 
possible for i, j, i. j, and ¿V j. Thus we find the results stated in the 
following theorem T2; it serves chiefly as a lemma for T3, which deals 
with the possible combinations of relevance properties for those sentences. 


T69-2, Let four sentences e, k, i, andj in £ be given. (a), (b), (c), and (d) 
deal with four cases concerning the signs of r for 7 and for j; they are al- 
ways meant to on e. It is easily seen that for any four sentences exactly 
one of these cases (a) to (d) applies. 

a. Let r either be >o for both i and f, or >o for one of them and o for 

the other. Then the following holds. 

(1) For at least one of the sentences 7.7 and iVj r>o. 

(2) If for iej r >o, then fori V7 1 may be >o, <o, oro. 

(3) If fori Vj t >o, then for t.j 1 may be >o, <o, oro. 

(4) Let m(e.i.) = o. (This is the case in particular if e.7.j is 
L-false, in other words, ¢ and j are L-exclusive with respect 
to e.) Then fori.j r= 0, and fori Vj r >o. (From T67-6d; 
T68-zb.) . 

(5) Let m(e. ~i. ~j) = o. (This is the case if e. ~i. ~j is L- 
false, hence þe D iVj.) Then for iVj rt = o, and for t.j 
t > o. (From T67-6e; T68-2b.) 

b. Let r either be <o for both i and j, or <o for one of them and o for 

the other. Then the following holds. 


380 VI. RELEVANCE AND IRRELEVANCE 


(1) For at least one of the sentences i.j and ¿Vj r < o. 

(2) If fori. j r < o, then fori Vj t may be >o, <o, oro. 

(3) If for iVj r <o, then for i.j r may be >o, <o, oro. 

(4) Let m(e.i.7) = o. Then for i.j7 r= o, and for i Vj r<o, 
(From T67-6d; T68-rb.) 

(5) Let m(e.~i.~j) = 0. Then for iVj r=o, and for i.j 
t < o. (From T67-6e; T68-2b.) 

c. Let r be >o for one of the sentences i and j and <o for the other. 
Then for 7.7 r may be >o, <o, or o and the same holds for i Vj, 
independently of 7.7; that is to say, all nine combinations are 
possible. 

d. Let r = o for both i and j. Then either r = o for both 7.7 and 
i Vj, orr > o for one and r < o for the other ; in the latter case, the 
one r-value is the opposite of the other. (From T68-1a.) 

(All items for which no references to other theorems are given can 
easily be established by scanning the list Tra; exact proofs are based 
on T68-3.) 

From T2 we derive the analogous theorem T3. While the former deals 
with the three signs of r, the latter deals with the three corresponding rele- 
vance properties, viz., positive relevance, negative relevance, and irrele- 
vance. From the point of view of application, the latter concepts and hence 
the theorem T3 dealing with them may perhaps be more interesting. How- 
ever, T3 cannot be as simple as T2 but must contain restricting condi- 
tions at certain points, because the relevance properties do not always 
correspond to the sign of r. We require in T3 that (A) e.ż is not almost 
L-false, and (B) e .j is not almost L-false. (A) means that either e «7 is 
L-false or m(e.7) >o; (B) means that either ¢.j is L-false or 
m(é.j) > o. It follows from (A) and (B) that e. (i V j) is either L-false 
or its m > 0; in other words that (C) e. (i Vj) is not almost L-false. 

Proof. Let e. (i Vj) be not L-false. It is L-equivalent to (e « i) V (e«j). 
Hence at least one of the sentences e «i and e « j is not L-false (T20-2q). There- 


fore at least one of their m-values is >o. (From (A), (B).) Hence m > o for 
their disjunction (T57-1j) and hence for e « GVJ). 


The conditions (A), (B), and (C) are needed for the use of T67-9 in 
the proofs (in particular, (C) for the application of T67-9d to 7 V j in the 
proof of T3a(3)). 


+T69-3. Let e, h, i, and 7 be sentences in &. For Qe it is assumed that 


'. none of the following sentences is almost L-false: (A) e.t, (B) e. j, and 


hence (C) e. (i Vj). Relevance and irrelevance are here always meant 
to h on e. 
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. Let i and j be either both positive, or one of them positive and the 


other irrelevant. Then the following holds. 

(1) Lemma. Either both r(i,h,e) and r(j,4,e) are >o, or one of them 
is >o and the other o. (From T67-9a, T67-8d.) 

(2) At least one of the sentences 7 .j and i Vis positive. (From (1), 
Tz2a(z), T67-8a.) 

(3) If 2.7 is positive and m(e.i.j) >o, then for i Vj all three 
cases are possible, that is to say, it may be positive, negative, 
or irrelevant. 

Proof. For i «j t > o (T67-9a, since m(e «i aj) > o). Hence for i Vj tmay 
be >o, <o, or o ((1), T2a(2)). Hence the assertion by T67-8a and b, T67-9d. 

(4) If i Vj is positive and m(e.i.7) > 0, then for 7.7 all three 
cases are possible. (From T67-9a, (1), T2a(3), T67-8a and b, 
67-94; in analogy to (3).) 

(5) Let m(e.i.j) = o. [This holds in particular if e 7.7 is L-false, 
hence} e D ~(i. j); in this case, 7.7 is irrelevant.] Then ¢ Vis 
positive. (From (1), T2a(4), T67-8a.) 

(6) Let m(e. ~i. ~j) = o. [This holds in particular if e . ~i. ~j 
and hence e . ~(i V j) is L-false, hence } e D i Vj; in this case, 
i Vj is irrelevant.] Then 7.7 is positive. (From (x), T2a(s), 
T67-8a.) ‘ 

Let i and j be either both negative, or one of them negative and the 

other irrelevant. Then the following holds. 

(1) Lemma. Either both r(,h,e) and r(j,h,e) are <o, or one is <o 
and the other o. (From T67-9b, T67-8d.) 

(2) At least one of the sentences ż « j and t V jis negative. (From (1), 
T2b(1), T67-8b.) ; 

(3) If i. j is negative and m(e.7.j) > o, then for iV j all three 
cases are possible. (From T67-9b, (1), T2b{2), T67-8a and b, 
167-9; in analogy to (a)(3)-) 

(4) If i Vj is negative and m(e.7.j) > 0, then for 7.7 all three 
cases are possible. (From T67-9b, (1), T2b(3), T67-8a, b, 
T67-9d.). 

(5) Let m(e.7.j) = o. [See remark in (a)(5).] Then 7 Vj is nega- 
tive. (From (1), T2b(4), T67-8b.) 

(6) Let m(e. ~i. ~j) = 0. [See remark in (a)(6).] Then 7.7 is 
negative. (From (1), T2b(5), T67-8b.) 


. Let one of the sentences 7 and j be positive and the other negative. 


Then for 7.7 all three cases are possible, and the same holds for 
i Vj independently of ż . j; that is to say, all nine combinations are | 
possible. (From T67-8a and b, T2c, T67-9d.) 
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d. Let both 7 and j be irrelevant and m(e.i.j) >o. Then i.j and 
i Vj are either both irrelevant or one is positive and the other nega- 
tive. (From T67-8d, Tad, T67-9d, T67-8a and b.) 

T3 gives account of all possible relevance situations described in terms 
of the relevance concepts: positive and negative relevance and irrelevance. 
We shall now study four cases whose possibility seems surprising at first. 
The cases 1a and 2a here are the cases 1 and 2 mentioned earlier (see the 
discussion preceding T68-1). We shall explain under what conditions they 
occur and illustrate them by simple examples with numerical values. Rele- 
vance and irrelevance is always meant to h on e. 


la. It is possible that each of two sentences is positive and nevertheless 
their conjunction is negative. For i and j, this occurs only in the case 
No. 9A in table Tra. The following is an example of t-values for this 
case: the values for (1) to (4) are —r, 2r, 2r, —3r, respectively; hence 
(5) and (6), that is, i and j, have both the value r. (This is like the ex- 
ample given above for No. 6E, but with opposite signs.) Generally speak- 
ing, any case constructed in the following way is of this kind. For given 
e and k, we take any three sentences (1), (2), and (3) satisfying the fol- 
lowing conditions: they are L-exclusive in pairs with respect to e; (1) is 
negative (to h on e); its r-value being —7; both (2) and (3) are positive 
such that the r-value of each is greater than r. When we have found any 
three sentences of this kind, we take as i the disjunction of (1) and (2), 
and as j that of (1) and (3). Then for both i andj r > o, and hence they 
are positive; but 7.7, which is (x1), is negative. 
1b. It is possible that each of two sentences is negative while their con- 
junction is positive. This occurs only in No. 6E. Cases of this kind can 
be constructed like those for (ra) but with opposite r-values. 
Example for ra. Let the prior evidence e contain the following information. 
Ten chess players participate in a chess tournament in New York City; some 
of them are local people, some from out of town! some are junior players, some 


seniors; some are men (M), some women (W). Their distribution is known to 
be as follows (see diagram); among the local juniors there is 1 M, 2 W; among 


i i 
Local Players Strangers 

j Juniors M,W,W M,M 
j’ Seniors M,M W,W,Ww 


the local seniors 2 M, no W; among the stranger juniors 2 M, no W; among 
the stranger seniors no M, 3 W. It is known that one and only one of the ten 


4 


: 
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players will be the winner. Furthermore, the evidence e is supposed to be such 
that on its basis each of the ten players has an equal chance of becoming the 
winner, hence 1/10. (For this assumption we do not presuppose the principle 
of indifference; it may be that e contains reports about previous achievements 
of the players; all that is assumed is that, for the chosen c, each of the ten pos- 
sibilities has the c-value 1/10 on e. Any additional evidence considered in this 
and the subsequent examples supplies the information that some of the ten 
players cannot win. It is assumed that in each case, on the basis of the in- 
creased evidence, the remaining players have equal chances of winning.) An 
observer X, who has this prior evidence e, considers the hypothesis h: ‘A man 
wins’. (We take this neutral, tenseless formulation instead of the customary 
‘a man will win’ because the same sentence will later be considered at other 
time points. The same holds for the formulations of i and j.) Consider the five 
sentences each of which predicts the winning of one of the five men. These sen- 
tences are L-exclusive in pairs with respect to e; and for each of them ¢ = 1/10. 
h is the disjunction of these five sentences. Therefore c(h,e) = 5/10 = 1/2 
(T59-1m). Now suppose that X receives during the course of the tournament 
the following report i: ‘A local player wins’. The man who reports this to X 
may have seen on the scoreboard that, on the basis of the games finished so far, 
all strangers are out, that is, can no longer become the winner; in other words, 
only local people are still in. (The sentences (1) to (4) in our previous discus- 
sion and in the table Tra are here as follows: (1): ‘a local junior wins’; (2): ‘a 
local senior wins’; (3): ‘a stranger junior wins’; and (4): ‘a stranger senior 
wins’; i is L-equivalent (with respect to e) to the disjunction of (1) and (2).) 
On the basis of the increased evidence e «7, the chance of winning is the same 
for each of the five local players, hence 1/5. There are 3 M among the five local 
players. Therefore, c(h,e « 4) = 3/5. Thus the c of the prediction 4 has been in- 
creased by the addition of the new information i from 1/2 to 3/5. Hence ż is 
positive to k on e. Suppose now that X receives instead of i the following 
report j: ‘A junior wins’. There are 3 M among the five juniors. Therefore 
c(h,e »j) = 3/5. Thus the addition of j to e leads likewise to an increase in 
the c of h from 1/2 to 3/5. Hence j too is positive. However, if X receives both 
reports i and j, then he learns from them that a local junior wins. There are 
three local juniors, among them one man. Therefore c(h,e »î «j) = 1/3. Thus 
by the addition of i „j to e the ¢ of h has been decreased from 1/2 to 1/3. Hence 
i aj is negative to k on e. r 

Example for 1b. Let e, t, and j be as in the previous example. We take-here 
the prediction 4’: ‘A woman wins’. Thus be D W= ~h). Therefore c(h’) = 
1 — c(h,e), and the same holds for the other evidences containing e. Thus we 
find that ¢(h'e) = 1/2; (We « i) = c(ht,e =j) = 2/5. Hence both i and j are 
negative to h’ on e. On the other hand, c(h',e =i. j) = 2/3- Hence # »j is posi-. 


tive to h’ one. 


Results in the Examples ra and 1b 


cfork Example 1a c for h’ Example 1b 
0.5 = 0.5 Py: e 
i 0.6 zis positive 0.4 iis negative 
j 0.6 jis positive 0.4 _ jis negative 
j 0.33 i aj is negative 0.67 i aj is positive 
Wh ae Se 
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2a. It is possible that each of two sentences is positive and nevertheless 
their disjunction is negative. For i and j, this occurs only in case No, 
6A in the table Tra. For this case we have given earlier (in the explana- 
tions preceding Tra) the following example of r-values: if the values for 
(1) to (4) are 3r, —2r, — 2r, and r, respectively, then (5) and (6), that is, 
i and j, have both the value r. Generally speaking, any case constructed 
in the following way is of this kind. For given e and h, we take any three 
sentences (1), (2), and (3) satisfying the following conditions: they are 
L-exclusive in pairs with respect to e; (2) and (3) are negative (to / on e), 
their r-values being —r, and —f;, respectively; (1) is positive such that 
its r-value is greater than r, and greater than r, but less than 7, + 7,; 
the latter condition is required in order to assure that for (4) r > o and 
hence for (7) r < o. We take again as 7 the disjunction of (1) and (2), and 
as j that of (1) and (3). Then for both ¿ and j t >o, hence they are 
positive; but i Vj, which is (7), is negative. 


2b. It is possible that each of two sentences is negative while their dis- 
junction is positive. This occurs only in No. gE. Cases of this kind can be 
constructed like those for (2a) but with opposite r-values. 


Example for 2a, We take e as before, and h’ as in the example for rb: ‘A wom- 
an wins’. Hence c(h’,e) = 1/2. Let i’ be: ‘A stranger wins’; this is L-equivalent 
to ~i with respect to e. Among the five strangers are 3 W. Hence c(h',e «i') = 
3/5 > 1/2. Thus i’ is positive to k’ on e. Let j’ be: ‘A senior wins’; this is L- 
equivalent to ~j with respect to e. Among the five seniors there are again 3W. 
Hence c(h’,e aj’) = 3/5. Thus j’ too is positive. i’ Vj’ says that a stranger 
or a senior wins. Among the seven players who are strangers or seniors (includ- 
ing stranger seniors) there are 3 W. Hence c(h’, a (i V3") = 3/7 < 1/2. Thus 
i’ Vj’ is negative to K’ on e. 

Example for 2b. We take e, i’, and j’ asin 2a, but has in ta. his L-equivalent 
to ~H' with respect to e; therefore the c-values here are the complements of 
those in 2a with respect to 1. (hye) = 1/2. c(he a i”) = c(he aJ’) = 2/5 < 1/2. 
Hence both i’ and j’ are negative to k on e. clhe a ( Vj") = 4/7 > 1/2. 
Hence i’ V j’ is positive to h on e. 


Results in the Examples 2a and 2b 
Evidence € for h’ Example 2a C for k Example 2b 
on DRS A 2:5. 
Cutt 0.6 i’ is positive 0.4 7’ is negative 
eee 0.6 _, J is positive 0.4 j’ is negative 
ee@’Vi)} 0.43 i’ Vj’ is negative 0.57 i’ Vj’ is positive 


Let us now investigate the case where j follows either from i alone or 
from 7 together with e. Our problem is whether in this case we can infer 


f 
f 
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from the relevance of i (to k on e) something about the relevance of j, 
and vice versa. The following theorem T5 dealing with this situation is 
based on the earlier theorem on t-values in cases of deductive rela- 
tions (T68-4). 

769-5. Let e.i D j, hence e.t. ~j is L-false. (This holds too if 
+i D j, hence 7. ~j is L-false.) 

a. r(j,h,e) = r(i,h,e) + r(~i = j,h,e). (From T68-4b(3) and (4).) 

b. (x) r(é,h,e) = v(j,h,e) — t(~i+7,h,e). (From (a).) 

(2) = (jhe) + c(i V ~j,he). (From (1), T67-5a, T21- 
5£(3).) 

We see from Ts that neither r for j is determined by r for 7 nor vice 
versa; in each case another sentence is also to be taken into considera- 
tion, and its influence may change the sign of r. Thus, if r for zis >o, r for 
j may be >o, <o, or o; and likewise if t for 7 is <o, or is o; and con- 
versely, if r for j has any sign, for ż all three cases are still possible. In 
other words, in spite of the deductive relation holding, there are still all 
nine combinations possible for ¢ and j. This is shown by the following table. 


The Nine Combinations in the Case that e .i L-implies j 


e Examples of Values for t 
Signs of T Case No. | 

-— —_ (x), (s) (2) (3) (4) (6) 

For T69-5: i j in Tra i j 

For T71-5: h k in Trb h k 
21 r o r —ar ar 

+ + f ar o r eA fi 

45 r o o =r r 

+ - 22 r o —2r r -r 

+ o 47 r o aw, o o 

nks + 25 TA o 2r -f r 

24 =ar o r r (A 
- - 26 in o ee ar sar 

(i Sr: o o r =r 

- o 48 ær, o r o o 

o + 39 o o r i. r 

o = 40 o o =r r =y 

o o 51 o o o o o 


In the first two columns for i and j the nine combinations are listed. The 
next column cites for each combination at least one case from the list 
Tra. The following columns (1) to (4) and (6) correspond to the columns 
in Tra for the cases in question. Here, however, we give not only the 
signs of r as in Tra but examples of r-values in accord with those signs; 
ris here some positive real number. Since bed D j, the t-value for (2) is 
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always o (T68-4b(z)); the cases here listed are all those from Tia where 
this holds. The r-value for (5), that is, 7, is always the sum of the values _ 
for (x) and (2) (T68-3b), hence here it is the same as that for (1); the 
value for (6), that is, j, is the sum of those for (1) and (3) (T68-3c), 

Simple cases where L-implication holds between two sentences and 
nevertheless the one is positive and the other negative can easily be found 


in the following way as special cases of the kinds Ia, tb, 2a, and 2b earlier 


discussed. 


la. It is possible that 7 is positive (to h on e) but 7.7, although it 
L-implies 7, is negative. See the former case 1a and the example for it. 

1b. It is possible that 7 is negative but 7. jis positive. See the former 
case rb and the example for it. 

2a. It is possible that 7 is positive but 7 Vj, although L-implied by i, 
is negative. See the former case 2a and the example for it. 

2b. It is possible that 7 is negative but i V jis positive. See the former 
case 2b and the example for it. 


These results are important because they show that certain opinions 
which seem to have been held sometimes are untenable. One is the view 
that, if 7 is positive (to h on e), then every sentence L-implied by i is 
likewise positive. The other is the view that, if 7 is positive, then every 
sentence L-implying 7 is likewise positive. 


§ 70. Relevance Measures for Two Hypotheses and Their Connections 


In this and the next section we investigate the case of two hypotheses / and 
k, and in particular the relations between the r-values of i to # and to k (on e), 
on the one hand, and the t-values of i to certain connections of k and k, es- 
pecially #V & and h.&, on the other. Thus this section is analogous to § 68, 
which dealt with two evidences i and Jj. And, indeed, every theorem of § 68 can 
be transformed into an analogous one here concerning two hypotheses, because 
of the commutativity of r. Thus we find here two new theorems of additivity 
for r: (1) if k and’& are L-exclusive with Tespect to e, then the r-value for i to 
h V & (on e) is the sum of the t-values for i to h and to k (T1b); (2) if hand k 
are L-disjunct with respect to e, then the t-value for i toh. & (on e) is the sum 
of the r-values for 7 to h and to & (T2b). 

If the r-values for i to each of the hypotheses hak, ha ~k, ~h «k, and 
~h . ~k (on e) are given, then the r-values for i to h, k, and h V k (on e) can 
be obtained as sums of one, two, or three of those four r-values, This is done first 
generally (T3) and then for all possible cases of deductive relations between 
h and k on the evidence e (T4). 


We have discussed in § 68 the relation between the relevance measures 
of two evidences 7 and j, for instance, observation sentences, to a certain 
hypothesis / on the basis of a prior evidence e and the relevance measures 
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of certain connections, especially 7 Vj and 7 .j, to k on e. Because of the 
commutativity of r, every result we have found there can likewise be 
applied to the case of two hypotheses, say, # and k, and their connec- 
tions. Suppose, for example, that we have found that under certain con- 
ditions for i and j r(i Vj,h,e) > o and hence i Vj is positive to h on e. 
Then, by commutation (T67-3), we obtain the result that under the same 
conditions r(k,i Vj,e) > o. Here, simultaneous substitution of ‘7’, ‘w, 
and ‘k’ for ‘h’, ‘i’, and ‘j’, respectively, yields this: if % and & satisfy cer- 
tain conditions, namely, those stipulated in the first case for 7 and j, then 
r(i,h V k,e) > o, hence 7 is positive to h V k one. In this way we reach 
theorems concerning the relevance of a given new evidence 7 to disjunc- 
tions or conjunctions of two hypotheses. 

From a merely theoretical point of view there would not be much pur- 
pose in stating the new theorems, since they derive from the former ones 
merely by commutation and substitution. However, the practical situations 
to which the new theorems are applicable are quite different from those 
for the earlier theorems. There we had the case of an observer X who con- 
siders the relevance of two possible observations he might make and of 
their connections, while all the time it is only one hypothesis, say, a law 
or a singular prediction, for which the relevance is meant. Here, on the 
other hand, X considers the relevance of one observation, actually made 
or expected as possible, to two different hypotheses, for example, two 
predictions concerning different features of tomorrow’s weather, and to 
their connections, especially their disjunction and their conjunction. For 
this reason, which concerns more the methodology of application than 
the system of inductive logic itself, it seems convenient to have the 
theorems which will be stated here in addition to the previous ones. 

In most cases it will be unnecessary to give proofs. It will be sufficient 
to indicate for each theorem here that earlier theorem to which it is 
analogous and from which it is derivable in the way indicated above or 
in a similar simple manner. 

The following two theorems of additivity are the analogues to T68-1 
and 2, respectively. 

770-1. Additivity for disjunctions of hypotheses. 

a. c(i hV k,e) = v(i,h,e) + ri,k,e) — r(i,h « k,e). (From T68-1a, T67-3.) 
+b. Let m(e.h.h) = o. (This is the case in particular if ¢.h.k is 

L-false, in other words, if % and.’ are L-exclusive with respect to e.) 
Then r(é,h V k,e) = vlé,h,e) + t,k,e). (From T68-1b.) 

c. Let k be a disjunction with » (2 2) components: hN ha V o. Vio. 
For any two distinct components Am and kp, let m(e « m « hp) = 0. 
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(This is the case if k., . . . , t, are L-exclusive in pairs with respect ' 
- toe.) Then r(i,h,e) = Dy, t(i,hy,e). (From T68-1c.) 


170-2. Additivity for conjunctions of hypotheses. 
a. r(i,h. k,e) = r(i,h,e) + r(i,k,e) — r(i,h V k,e). (From T68-2a.) 4 
b. Let m(e. ~h . ~k) = o. (This is the case if pe D 4V k, in other 
words, if % and k are L-disjunct with respect to e.) Then (i,t. k,e) = 
t(i,h,e) + r(i,k,e). (From T68-2b.) ; 
c. Let h be a conjunction with n (= 2) components: A, «hz»... «hn. 
For any two distinct components k» and hy, let m(e s ~hm a ~h) = 
o. (This is the case if im and h, are L-disjunct with respect to e, 
ie., He D hm V hy.) Then r(i,h,e) = Dd tlie). (From T68-2c.) 
Suppose two hypotheses %4 and & are given. We consider the four sen- 
tencesh.k, h. ~k, ~h. k, and ~h.~k. These sentences form a con- 
venient basis for studying the r-values for any connections of / and k, be- 
cause any such value can be determined as sum of the r-values of some A 
of those four sentences. This leads to T3 as an analogue to T68-3. 
In this section and the next one, we shall use the symbols ‘t,’, etc., for 
` four r-values as follows. (Note that these symbols will no longer have — 
the meanings they had in the two preceding sections; the r-values here 
denoted by them are analogous to but not identical with the earlier ones.) 
tı = r(i,k. k,e), 
Ta = r(i,k. ~k,e), 
t; = r(i, ~h . k,e), 
ty = (i, ~h. ~k,e). 


T70-3. Let e, h, k, and i be sentences in &. 

at, + ta +r, + r, = o. (From T68-3a.) 
` b. t,he) = t, + r.. (From T68-3b.) 

c. r(i,k,e) = t: + r. (From T68-3c.) 


d. (1) (ih V ke) = t, + r: + 4,; j 
(2) Torie 
(From T68-3d.) 


If any deductive relations hold between and k, then one or two or 
three of the four sentences are L-false and hence have the r-value o. 
These and related cases are dealt with in,T4, which is analogous to T68-4. 


770-4. A 
a. Let m(e. h.k) = o. (This is the case in particular if e.h. k is 
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L-false, in other words, } e. D ~k, h and k are L-exclusive with 
respect to e.) Then the following holds. 

(1) tr = 0. 

(2) t2 +t; + r, = 0. 

(3) r(i,h,e) = ta. 

(4) t,he) = ty. 

(5) r(i,h V ke) = ta +t; = — t4 

(From T68-4a.) 


. Let m(e . h . ~k) = o. (This is the case if e . h . ~k is L-false; hence 


pesh D k.) 

(1) r: = o. 

(2) t: +r, +1, = 0. 

(3) t(é,h,e) = tı. 

(4) r(i,k,e) = tlih V k,e) = t: + ts = —t 
(From T68-4b.) 


. Let m(e. ~h . k) = o. (This is the case if e . ~h . k is L-false; hence 


penk DRY 

(1) t, = o. 

(2) r: tab ty =O. 

(3) r(i,k,e) = ts. 

(4) r(,hje) = th V k,e) = t: + ta = =t 
(From T68-4c.) 


. Let m(e . ~h . ~k) = o. (This is the case if e. ~h . ~k is L-false; 


in other words, fe D AV k, h and k are L-disjunct with respect to e.) 
(1) r, = 0. 

(2) ti Hiatt; = 0. 

(3) r(i,h,e) = tı + ta = — 1s- 

(4) rli,k,e) = te F Ts = — ta 

(5) r(i,h V k,e) = 0. 

(From T68-4d.) 


. Let mle. h.k) = mesk. ~k) = o; in other words, m(e . k) = o. 


(This is the case if e. 4 is L-false; hence | e D ~h.) 
(1) t = t, = 0. 

(2) t, +t, = 0. 

(3) r(ih,e) = o. 

(4) r(i,k,e) = r(i,h V k,e) foot? EE Us 

(From T68-4e.) ; 


. Let m(e «h. k) = mle. ~h. k) = o; in other words, m(e.k) = o. 


(This is the case if e . k is L-false; hence }e D ~k.) 
(1) t= r; = o. 
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(2) +r, = 0. 
(3) r(i,k,e) = o. 
(4) r(i,he) = rh V k,e) = t, = =r, 
(From T68-4f.) 


» Let m(e.h.k) = m(e.~h.~k) = 0; in other words, m(e. 


(h =k)) = o. (This is the case if e. (h =k) is L-false; hence 
Fe D (h = ~k).) 
(Uee reo; 

(2) n+t,=0. : 
(3) r@,4,e) = tr. = =r. 
(4) r(i,k,e) ='t, = —r, = —r(i,h,e). 
(5) r(i,k V k,e) = o. 

(From T68-4g.) 


«Let m(e.h.~k) = m(e.~h.k) = 0; in other words, m(e. 


~ (4 =k)) = o. (This is the case if e.~ (h =k) is L-false; 
hence }e D (h = k), h and k are L-equivalent with respect to e.) 
(i) n=t,=0: 


(2) n +r, = o0. 
(3) t,he) = r(é,k,e) = (i,k V ke) = t, = =r, 
(From T68-4h.) 


» Let m(e. h. ~k) = m(e. ~h . ~k) = o; in other words, m(e . ~k) 


= o. (This is the case if e. ~k is L-false; hence } e D k.) 
(1) = r= o. 

(2) tr +14; = o. 

(3) t@h,e) = r: = =r. 

(4) r(@,k,e) = r(i,h V k,e) = o. 

(From T68-gi.) 


» Letm(e.~h.2) = m(e. ~h . ~k) = o; in other words, m(e. ~h) 


= o. (This is the case if e . ~h is L-false; hence | e D h.) 

(1) n = % =o. 

(2) r+ t= o. 

(3) rli,h,e) = r(i,h V k,e) = o. 

(4) r(i,k,e) = t, = =r.. 

(From T68-4j.) 

Let any thrče of the four sentences h.k, ha ~k, ~h.k, and 
~h. ~k be selected. Let m = o for the three conjunctions of e 
with each of the selected sentences. (This is the case if these three 
conjunctions are L-false, and hence e L-implies the one sentence 
among the four which has not been selected.) 

(1) t= n= y= = o. 
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(2) r for i is o to each of the sentences k, k, h V k, and h «k (one). . 

(From T68-4k.) as 
1. Let m(e. h.k) = mesh. ~k) = me. ~h. k) = mle. ~h. ~k) 

= o; in other words, m(e) = o. (This holds if e is L-false.) 

(1) t: = ta = T; = t4 = O. 

(2) t for i is o to each of the sentences h, k, h V k, k «k (on e). 

(From T68-4l.) 


T4 deals with all possible cases of deductive relations between % and k 
on the evidence e. If the t-value of ż is given for each of the four L-exclu- 
sive hypotheses h.k, h « ~k, ~h.k, and ~h. ~k, then T3 and T4 state 
the values for h, k, and h V k, and enable us to determine easily the values 
for any other connections of 4 and k. T3 does this,in general, and T4 for 
all cases of deductive relations. 


§ 71. The Possible Relevance Situations for Two Hypotheses and Their 
Connections 


The possible relevance situations for į to two hypotheses # and & and their 
connections are investigated, as characterized by the signs of r. Thus this sec- 
tion is analogous to § 69. A complete list of the possible relevance situations 
is given by another interpretation of the earlier table (T69-1b). With the help 
of this table, general theorems about possible relevance situations are derived, 
first in terms of signs of r (T2), and then in terms of the relevance concepts (T3). 
Four kinds of relevance situations, whose possibility seems surprising at first 
glance, are studied more in detail; among them are the following: (3a) ¢ is 
positive (on e) to both # and k, but negative to h« k; (4a) i is positive (on e) to 
both + and &, but negative to % V k. These possibilities are illustrated by ex- 
amples with numerical values. Finally, a general theorem and examples show 
the following: if it is known that % is L-implied either by + alone or by e « k, 
then from the relevance of i to h (on e) nothing can be inferred concerning the 


relevance of i to k, or vice versa. 


We have earlier constructed, on the basis of T68-4, the table T69-1a 
which lists all possible relevance situations with respect to t, j, and their 
connections. Each relevance situation is here characterized, not by the 
numerical values of t for the sentences involved, but merely by what we 
have called the sign of t, that is to say, a statement saying whether r is 
>o, <o, or o. Now on the basis of T70-4, we can construct a completely 
analogous table. This table represents another but analogous class of pos- 
sible relevance situations, namely, those for the sentence i, which remains 
the same throughout, but with respect to several hypotheses, viz., (1) 
hak, (2) h. ~k, (3) “h.k, (4) ~h. ~k, (5) h, (6) k, and (7) hV k. 
As previously, the relevance situations are characterized by the signs of r. 
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It is however not necessary to write an entirely new table; we take now 
as Table T69-1b simply the earlier table but with the seven sentences 
just mentioned at the heads of the seven columns, as indicated there on 
the line b. This simple procedure is possible because of the perfect analogy 
between T70-4 and T68-4; but we can also derive T69-1b directly from 
T69-1a, without the use of T70-4, by commutation and substitution, as 
_explained in the preceding section. 

As the table T69-1a led us to T69-2, so now the table T69-1b may lead 
us to the following analogous theorem T2; the latter can, however, be 
derived more simply from T69-2 directly by commutation. T2 states in 
general terms which combinations of signs of r are possible for i to h, k, 
h.k, and hV k. 


T71-2, Let four sentences e, h, k, and żin £ be given. (a), (b), (c), and 
(d) deal with four cases concerning the signs of r for i to the hypotheses h 
and k on e. It is easily seen that for any four sentences exactly one of 
these cases (a) to (d) applies. r is here always meant for i on e; thus only 
the hypothesis (i.e., the second argument of r) is explicitly referred to in 
each case. 

a. Let r either be >o to both + and & (i.e., r(i,h,e) and r(i,k,e) > 0), 

or >o to one of them and o to the other. Then the following holds. 

(1) t > o to at least one of the hypotheses + . k and h V k. 

(2) If r > o to h.k, then it may be >o, <o, or o to h V k. 

(3) If r > o to 4V k, then it may be >o, <o, or o to 4 . k. 

(4) Let m(e . h . k) = o. (This is the case in particular if e . h . k is 
L-false; in other words, h and & are L-exclusive with respect to 
e.) Then r = o to 4 «k, and t > o to h V k. 

(5) Let m(e. ~h. ~k) = o. (This is the case if e. ~h. ~k is 
L-false; hence }e D #V k.) Then r= o to kV k, and r >o 
to h.k. 

(From T69-2a.) 

b. Let r either be <o to both 4 and k, or <o to one of them and o to 

the other. Then the following holds. 

(1) r < o to at least one of the hypotheses h. k and h V k. 

(2) If r <otok.k, then r may be >o, <o, oro toh V k. 

(3) If r < o to 4 V k, then r may be >o, <o, oro to h « k. 

(4) Let m(e . h . k) = o. Then r = o to 4 . k, andr < o to h V k. 

(5) Let me. ~h. ~k) = o. Then rt = oto h V k, andr < oto h . k. 

(From T69-2b.) 

c. Let r be >o to one of the hypotheses and k, and <o to the other. 
Then r may be >o, <o, orotoh.k, and it may also be >o, <0, oro 
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to h V k, independently of 4 . k; that is to say, all nine combinations 
are possible. (From T69-2c.) 

d. Let r = oto both + and k.Then either r = o to bothh.kandh V k, 
or t > o to the one and r < o to the other; in the latter case, the 
one r-value is the opposite of the other. (From T69-2d.) 

While T2 is in terms of the three signs of r, the similar theorem T3 uses 
instead the three corresponding relevance concepts, viz., positive rele- 
vance, negative relevance, and irrelevance. We require in T3 that e tis 
not almost L-false. This assures that the correspondence between the 
three r-signs and the three relevance concepts holds here throughout (as 
seen from T67-8 and T67-9). [With respect to this restricting condition, 
the analogy between T69-3 and T3 does not hold. In T69-3 it was neces- 
sary to require that neither ¢. nor € «j is almost L-false. The analogous 
condition here would be that neither ¢. nor e «k is almost L-false. How- 
ever, it is here sufficient to require instead that e «4 is not almost L-false. 
We do not prove T3 simply with the help of T69-3 by commutation, be- 
cause the symmetry of the relevance concepts (T65-6) has earlier been 
proved only on the basis of assumptions which were stronger than the con- 
dition just mentioned. We shall instead base the proof on the theorem T2 
concerning r-signs and the earlier theorems (T67-8 and T67-9) stating the 
correspondence between t-signs and relevance concepts.] 


+T71-8, Let e, h, k, and é be sentences in £. For e it is assumed that 
e.i is not almost L-false. Relevance and irrelevance are here always 
meant on evidence e. 
a. Let i be either positive to both » and k, or positive to one of them 
and irrelevant to the other. Then the following holds. 
: (1) i is positive to at least one of the hypotheses } » k and 4 V 4. 


(2) If iis positive to hk. k, then for h V $ all three cases are possible, 
that is to say, i may be positive, negative, or irrelevant to h V k. 
(3) If i is positive to h V k, then for ht. k all three cases are possible. 
(4) Let m(e. h.k) = o. (This is the case if eshek is L-false.) 
j Then å is irrelevant to # » k and positive to A V k. 
| (5) Let m(e. ~h. ~k) = o. (This is the case if e. ~h. ~k is 
D L-false; hence Fe D hV k.) Then é is irrelevant to Vk, and 
k positive to h k. 
b. Let i either be negative to both # and k, or negative to one of them 
and irrelevant to the other. 
(1) i is negative to at least one of the hypotheses 4 » k and h V &. 
(2) Tf i is negative to h « k, then all three cases are possible for h V k. 
(3) If iis negative toh V k, then all three cases are possible for h « k. 
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(4) Let m(e «h. k) = o. Then 7 is irrelevant to + . k, and negative 
toh Vk. 

(5) Let m(e. ~h . ~k) = o. Then żis irrelevant to } V k, and nega- 
tive to k. k. 

c. Let 7 be positive to one of the hypotheses / and k, and negative to 
to the other. Then 7 may have any of the three relevance relations 
to h.k, and, independently, any of the three to +V k; that is to 
say, all nine combinations are possible. 

d. Let 7 be irrelevant to both # and k. Then iis either irrelevant to both 
h.kand kV k, or it is positive to the one and negative to the other. 
(From T2, T67-8, T67-9.) 


T3 states which relevance situations are possible in terms of the rele- 
vance concepts. Among these relevance situations there are some whose 
possibility seems surprising at first glance. We shall now describe and 
analyze four kinds of such cases, and then illustrate them by examples. 
The cases 3a, 3b, 4a, and 4b here are analogous to the cases 1a, rb, 2a, 
and 2b, respectively, in § 69. 

3a. It is possible that z is positive to each of two hypotheses h and k (on e) 
and nevertheless negative to their conjunction. This occurs only in the case 
No. 9A in table T69-rb. Example of r-values: for (1) to (4), —r, 2r, 2r, 
and —3r, respectively, hence r for (5) and for (6). In general, any case 
constructed by a procedure analogous to that described under (ra) in 
§ 69 is of this kind (with h, k, and i for i, j, and h, respectively). 

3b. It is possible that i is negative to each of two hypotheses h and k 
(on e) and nevertheless positive to their conjunction. This occurs only in 
No. 6E. 


Example for 3a. This example is based on the previous example of the chess 
tournament given under (1a) in § 69. We take as prior evidence e the same as 
there. Here, however, the observer X is interested in two hypotheses + and k, 
which are predictions concerning the result of the tournament. h is: ‘A local 
player wins’; and k: ‘A junior wins’. Among the ten players five are local 
people; therefore c(t,e) = 5/10 = 1/2. The number of juniors is also five; 
hence likewise ¢(k,e) = 1/2. The conjunction h « k says that a local junior wins. 
There are three local juniors; hence c(h» k,e) = 3/10. Now X receives the re- 
port z: ‘A man wins’; it may be based on the result that all women are out. 
(The sentences +, k, and i here are the same as i, j, and k, respectively, in the 
example for (1a).) For the problems in inductive logic as to what is the’c of 
h, k, and their connections on the evidence e «i and, consequently, what is the 
relevance of i to those hypotheses, it does not matter, of course, at which time 
point the report iis given to X, and what are the motives for the speaker to say 
no more than i; all that matters is that X acquires, in addition to e, the knowl- 
edge of + and nothing else. It may be, for instance, that the speaker himself 
knows only i; or, again, it may be that he knows that not only all women are 


§ 71. RELEVANCE SITUATIONS FOR TWO HYPOTHESES 395 


out but also some of the men and that he does not care to specify the class of 
those who are still in beyond saying that all of them are men; finally, as an ex- 
treme case, it may be that the tournament is already finished and that the 
speaker knows who is the winner but says to X merely that he is a man. Among 
the five male players three are local; hence c(h,e i) = 3/5. Thus the addition 
of i to e increases the c of 4 from 1/2 to 3/5. Hence i is positive to 4 (always on 
e). Among the five male players there are three juniors; hence c(k,e « i) = 3/5. 
Thus the c of k is likewise increased from 1/2 to 3/5. Hence ‘is positive also to k. 
On the other hand, there is only one local junior among the five men; hence 
c(h « k,e »i) = 1/5. Thus the cof k . k is decreased from 3/10 to 1/5. Hence t is 
negative to hak. 

Example for 3b. Let e, h, and k be as in the example just given for (3a). 
Hence the c-values on e are the same as there. However, instead of i we take 
here i’: ‘A woman wins’. (This is the same as f’ in the earlier example for (tb) 
in § 69.) Among the five women, the number of local players is two, that of 
juniors is two, and that of local juniors is also two. (These are always the same 
two persons.) Hence c(h,¢«i’) = c(k,e si’) = c(h a k,e « i’) = 2/5. Thus by the 
addition of i’ to e, the cof his decreased from 1/2 to 2/5; and likewise the c of k; 
but the c of » « k is increased from 3/10 to 2/5. Hence i’ is negative to h and to k 
(on e) but positive to h« k. 


Results in the Examples 3a and 3b 


HyrorHesis 


EXAMPLE 38 EXAMPLE 3b 


i’ is negative 


iis positive iii i 
iis positive i’ is negative 
ak i is negative i’ is positive 


4a. It is possible that 7 is positive to each of two hypotheses h and k (on e) 


and 
No. 


nevertheless negative to their disjunction. This occurs only in case 
6A in the table T69-1b. Examples can be constructed as described 


under (2a) in § 69, but here with h, k, and in the place of 3, j, and h.” 


4b. It is possible that ¿is negative to each of two hypotheses h and k (on e) 


and 


nevertheless positive to their disjunction. This occurs only in No. 9E. 


Example for 4a. (This example is analogous to that for (2a) in § 69; h’, k', 
and i’ here are the same as 7’, j’, and k’ there, respectively.) e is the same as 
in the previous examples. 4’ is: ‘A stranger wins’; k’: ‘A senior wins’. Among the 
ten players the number of strangers is five, and that of seniors also five. Hence 
c(h,e) = c(k',e) = 5/10 = 1/2. h’ \ k' says that the winner is a stranger or a 
senior (possiblya stranger senior). The number of those players who are strangers 
or seniors (not excluding stranger seniors) is seven. Hence ¢(h’ V k'e) = 7/10.. 
Let i’ be: ‘A woman wins’. The number of strangers among the five women is 
three, and likewise that of seniors, and likewise that of those who are strangers or 
seniors. (These are always the same three persons.) Hence c(W’,¢ « 1”) = c(k’,¢ 07’) 
= c(h’ V ke «i’) = 3/5. Thus by the addition of i’ toe the cof A’ is increased, 
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and likewise the ¢ of $’. On the other hand, the ¢ of h’ V &’ is decreased. Hencei’ 
is positive to h’ and to k’ (on e), but negative to h’ V k’. 

Example for 4b. We take e, k’, and k' as in (4a). Therefore the c-values on e 
are the same as in (4a). But instead of i’ we take here i as in (3a): ‘A man wins’, 
Among the five men the number of strangers is two, and likewise that of seniors; 
but the number of those who are strangers or seniors is 4. Hence c(h’,e.i) = 
c(k’,e ai) = 2/5; but c(h’ V k'e. i) = 4/5. Thus i is negative to h’ and to k’ 
(on e), but positive to h' V k’. 


Results in the Examples 4a and 4b 


EXampce 4b 

one onest 
i’ is positive o5 0.4 i is negative 
7’ is positive 0.5 0.4 7 is negative 
i’ is negative 0.7 0.8 d is positive 


The following theorem T5, which is analogous to T69-5, deals with the 
case where one hypothesis, either alone or together with e, L-implies the 
other. The theorem answers the question whether from the relevance of i 
to the one hypothesis something can be inferred about its relevance to the 
other. The answer is in the negative. Ts is based on the earlier theorem on 
t-values in cases of deductive relations (T70-4). 


T71-5. Let }e.k D k; hence e.h. ~k is L-false. (This holds too if 
ļh D k; hence h. ~k is L-false.) 
a. r(i,k,e) = r(i,he) + r(i,~h . k,e). (From T70-4b(3) and (4).) 
b. (1) r(i,h,e) = x(i,k,e) — thi, xh . k,e). (From (a).) 
(2) = r(i,ke) + r(i,h V ~k,e). (From (x), T67-sb, T21- 
5f(3).) 


T5 shows that neither the relevance of i to & is determined by that of i 
to k nor vice versa. In spite of the deductive relation holding between h 
and k, there are still all nine combinations of t-signs possible. This is shown 
by the table following T69-s, here interpreted as giving examples of r- 
values for ¢ to k, k, and their connections, the numbers (r) to (6) referring 
to the table T69-1b. 

Simple cases where L-implication holds between two hypotheses and 
nevertheless 7 is positive to the one and negative to the other can easily 
be found in the following way as special cases of the kinds 3a, 3b, 4a, and 
4b discussed above. 
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8a. It is possible that 7 is positive to / (on e) but nevertheless negative 
to h . k, although the latter L-implies the former. See the previous case , 
3a and the example for it. 

3b. It is possible that ż is negative to h but positive to A.k. See the 
previous case 3b and the example for it. 

4a. It is possible that ż is positive to # but negative to % V k, although 
the latter is L-implied by the former. See the previous case 4a and the 
example for it. 

4b. It is possible that 7 is negative to / but positive to h V k. See the 
previous case 4b and the example for it. 

In our later discussion on the classificatory concept of confirmation © 
we shall mention and examine certain principles stated by other authors 
(§ 87). One of these principles (called the Special Consequence Condi- 
tion) says that, if 7 confirms # and } % D k, then i confirms k. If we as- 
sume that the relation of confirming in this principle is meant in the sense 
of what we, following Keynes, have called positive relevance (either on a 
given evidence e or on the tautological evidence ‘?’), then the principle 
is refuted by the case (4a) just explained. Another principle (called the 
Converse Consequence Condition) that has been stated, though not to- 
gether with the first, says that, if ¢ confirms kand}h D k, then 7 confirms 
h. If we interpret ‘confirming’ again as above, then this principle is re- 
futed by the case (3a) just explained. Thus it seems important to become 
clearly aware of this result of our preceding discussions: if we know merely 
that is positive to » on e (for example, if somebody tells us just this with- 
out, however, specifying the three sentences), then it is not possible for 
us to infer whether 7 is positive, negative, or irrelevant to a sentence L-im- 


plied by % or to a sentence L-implying h. 


§ 72. Relevance Measures of State-Descriptions; First Method: Dis- 
junctive Analysis 


The first method for the relevance analysis of a sentence i in Qy consists in 
analyzing i into its ultimate disjunctive components; these are the state-de- 
scriptions in its range Ri. It is found that the relevance measure r for i (to 
h on e) is the sum of the r-values for these 3 (T7b). Since for any 8 outside of 
Re r = o (30), the t for t is the sum of the t-values for the 8 in Re» i) (T7d). 
This range consists of two parts, R(e =i. h) and R(e i= ~h), which we call 
WR, and Ra, respectively. For every 8 in R: t Z0, for every Zin Ra t s o 
(T7c). A table is given (T8) which states, for all possible cases of deductive 
relations between e, h, and 7, the sign and value of r for i (to # on e) and the 
sign of r for any 3 in R: and in Ra and thereby the relevance of these sen- 


tences (to k on e). 
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We have seen that the relevance measure r is additive with respect to a 
disjunction i with L-exclusive components (T68-1c), that is to say, the 
t-value of 7 to a given h on e is the sum of the values for the components, 
There are, of course, in general many ways of dividing a given i into L- 
exclusive disjunctive components. If one such disjunctive representation 
of 7 is known, then it will in general be possible to split its components 
again into further L-exclusive disjunctive components. Let us restrict the 
following discussion to sentences in a finite system £y. Analyzing a sen- 
tence 7 into L-exclusive disjunctive components is the same as dividing its 
range N; into exclusive (i.e., nonoverlapping) parts. Thus it is clear that 
this procedure of further and further disjunctive analysis comes to an end 
when we have reached the smallest nonnull ranges, in other words, when 
we have reached state-descriptions 3 as disjunctive components. The 
range of 3; contains just 3; itself and nothing else. Therefore, we cannot 
divide R(3;) into two nonempty parts. Hence it is not possible to trans- 
form 8; into a disjunction of two L-exclusive, non-L-false sentences in &y. 
If this ultimate disjunctive analysis of ¿ into certain 8 is carried out, then 
the t-value for 7 (to % on e) can be determined as the sum of the values for 
these 8. These latter values provide a more detailed characterization of 
the relevance situation for i than the mere t-value of 7 itself. They reveal 
how the latter value is, so to speak, built up out of its smallest parts. For 
example, the t-value o for i may emerge in two quite different situations: 
if r = o for every 3 in question, then r must be o for i too; but r for dis o 
also in the case where some of the 3 in question have positive r-values and 
others have negative ones, provided these values balance each other. 
Therefore, the r-values of the 3 involved furnish a good basis for a closer 
investigation of the relevance situation for i. This is what we call the first 
method; it will be developed in this section. 

Since r is additive also with respect to a conjunction with L-disjunctive 
components, there is another method for the investigation of the rele- 
vance situation. Here, i is analyzed into its smallest conjunctive parts. This 
second method, which uses likewise certain 8 but not the same as the first 
method, will be dealt with in the next section. 

The basic ideas of the first method now to be developed are quite simple. 
Any non-L-false sentence i in gy is L-equivalent to the disjunction of the 
8 in R; (T21-8c). These 3 are L-exclusive in pairs (T21-8a). Hence, ac- 
cording to the theorem of additivity for disjunctions (T68-rc), the t- 
value for z (always meant to a given h on e) is the sum of the r-values for 
the 3 in ®,. Now R; can be divided into two parts (one of which may be 
empty), R(e .i) and R(~ e.i). If 3, is any 3 in the latter part, then e 


' 


} 


aie! 
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does not hold in 3; and hence r for 3; is o, as we shall see. Thus the 3 in 
R(~ e.i) contribute nothing to the r of 7; hence the r of 7 is the sum of 
the r-values for the 3 in R(e . t). Now R(e.7) can again be divided into 
two parts (possibly empty), R(e «i. h) and R(e.i. ~h), which we shall 
call Rı and Ra. We shall find that in general for all 3 in R, t > o; if, 
however, e and h fulfil a certain special condition, then for all of those 3 
r = 0; tr < o cannot occur. On the other hand, for all 3 in R, in most 
cases r < 0; if e and h fulfil another special condition, then for all 3 in 
Ra t = 0; r > o is not possible. We shall find theorems which state, for 
the different possibilities of deductive relations between the three sen- 
tences e, h, and i, the r-values for the 3 in the two ranges mentioned and 
the t-value for 7 based upon them. 

It will be convenient for our discussions in the remainder of this chapter 
to use some abbreviations. We construct the truth-table for e, i, and h; 
then we use ‘k,’,..., ‘ks’ for the eight conjunctions representing the 
eight lines of the truth-table, as indicated in the following table. (For the 


k enaa ne Ra K 
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logical properties of these eight conjunctions see § 21B.) For ‘R(kn)’ 
(n = 1 to 8), we write simply ‘Ra’; for ‘m(k,)’ ‘mn’. (k, to k, and m, to m, 
are here the same as in § 65.) 

For every n from to 8, m, = oif and only if k, is L-false, hence if and 
only if R, is null. The eight &-sentences are L-exclusive in pairs (T21-7a). 
If j is any non-L-false molecular sentence constructed out of e, 7, and h, 
then j is L-equivalent to a disjunction of some of the k-sentences (namely, 
those corresponding to the lines of the truth-table for which j has the 
truth-value T, T21-7d); hence Ky is the class-sum of the ranges of these 
k-sentences, and m(j) is the sum of the m-values for these &-sentences. 
(For example, e.7 is L-equivalent to k: V ka; R(e a i) consists of R, and 
Ra; m(e.7) = m: + ma) 
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The following theorem states the r-values for 7, ~i, and some of the 
k-sentences in terms of m-values. 

172-1, Let e, k, and i be any sentences in a finite or infinite system 2 
such that m has values for the arguments involved (k,, etc.). 

a, t(i,h,e) = m: X m, — m: X my. (This is T67-1.) 

b. r(k,,h,e) = m, X (m, + m). 

Proof. According to D67-1, t(ki,h,e) = m(ea hs esish) X m(e) — mle. h) 
X mlesesis h) = m X me) — mle s h) X m = m: X (m +m,) (T65-1e 
and c). 
tka h,e) = —m. X (m: + m,). (Analogous to (b).) 

. t(i,h,e) = r(k,,h,e) + t(ka,h,e). (From (a), (b), (c).) 
. t(k;,h,e) = m, X (m, + m,). (Analogous to (b).) 

t(k,,h,e) = —m, X (m: + m,). (Analogous to (b).) 

. t(~ t,he) = m: X m, — m: X m,. (From T67-sa, (a).) 
« (~ ihe) = r(k;,h,e) + t(ky h,e). (From (e), (f), (g).) 

We shall now apply r to 8. T2 is a lemma with whose help we determine 
the r-value of a 3; in each of the eight partial ranges (T3, T4, Ts). The 
r-values for ~3; are stated also, because we shall need them later for the 
second method. These theorems and most of the subsequent ones are 
restricted to a finite system ly because only here do we have 3 as sen- 
tences. (In Q», the 8 are infinite classes of sentences; m, c, the relevance 
concepts, and r have been defined for sentences only.) Hence we can here 
use the theorem of the strict correspondence between relevance concepts 
and r-values (T67-10). 

T72-2, Lemma. Let e, h, and i be any sentences in fy, and 3; any 
3 in fy. Then 1(3.,h,e) = m(e <h. 3:) X m(e) — mle. h) X mle. 8). 
(From D67-1.) 

T72-3. Let 3; be any 3 (in Qy) in Rs, Re, Ry, or Ns, hence in R(~ e); 
in other words, any 3 in which e does not hold. Then the following is the 
case. 

a. (1) R(~ e) is not null. 

(2) ~e is not L-false. (From (1).) 
(3) eis not L-true. (From (2).) 

b. ¢. 8: is L-false; þe D ~ 3: 

Proof. ~e holds in 3; (T19-2); hence | 8: D ~e (T20-2t); hence the asser- 


tion. 3 


c. 1(3,,h,e) = o. (From (b), T67-6d.) 
d. r(~ 8,,h,e) = o. (From (c), T67-5a.) 
e. 3: and ~ 3; are irrelevant to 4 on e. (From (c), (d), T67-10d.) 


rR rono 
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T7T2-4. Let 3; be any 3 (in Qy) in Rı or R;, hence in R(e . k); in other 
words, any 3 in which e and %4 hold. 
a. (1) Rı Y R;, that is, R(e. A), is not null. 
(2) e.h is not L-false. (From (r).) 
(3) m: + m, > o. (From (z).) 
b. (1) t(3:,h,e) = m(e. ~h) X m(3,). 
(2) a (m: + m,) X m(3;). 
Proof. 1. 3: L-implies e.s% and e (T20-2t). Therefore m(8s . e.h) = 
m(3i «e) = m(3,) (T21-5i(1)). m(e) — m(e a k) = m(e. ~h) (157-11). Hence 
(1) by T2. 2. From (1), T65-1d. 
c. (1) t(~ Bahe) = —m(e. ~h) X m(8,). 
(2) = —(m, + m,) X m(8:). 
(From (b), T67-5a.) 
d. Let e . ~h be L-false. Then the following holds. 
(1) R, and R, are null. 
(2) ma = m, = o. (From (1).) 
(3) t(3:,4,e) = o. (From (d)(2), (b)(2).) 
(4) t(~ Bahe) = o. (From (3), T67-5a.) 
(5) 3:and ~8; are irrelevant to hon e. (From (3), (4), T67-10d.) 
e. Let e. ~h not be L-false. 
(1) Ra V R, is not null. 
(2) m, + m, > o. (From (1).) 
(3) r(3:,h,e) > o. (From (e)(2), (b)(2).) 
(4) 3: is positive to h on e. (From (3), T67-10a.) 
(5) r(~ 3:,h,e) < o. (From (3), T67-5a.) 
(6) ~; is negative to h on e. (From (5), T67-10b.) 
T72-5. Let 3; be any 3 (in 2n) in Ra or R, hence in R(e. ~h); in 
other words, any 3 in which e and ~} hold. 
a. (1) Ra O Ra that is, R(e . ~h), is not null. 
(2) e. ~h is not L-false. (From (1).) 
(3) ma + m, > o. (From (1).) 
b. (1) r(3:,4,e) = —m(e . h) X m(8:). 
~ (2) ‘= — (m, +m) X m8). 
Proof. 1. | 3: D ~h (T20-2t), hence Bi « h is L-false, hence m(e «kh» 3:) = 
o. Thus (1) by T2. 2. From (1), T65-1c. 
c. (1) t(~ Bahe) = m(e h) X m(B,)- 
(2) = (m, + m,) X m(8;). 
(From (b), T67-5a.) 
d. Let e . h be L-false. Then the following holds. 
(1) R: and R, are null. 


t 
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(2) m: = m, = o. (From (z).) 

(3) t(8:,4,e) = o. (From (d)(2), (b)(2).) 

(4) t(~ 8:,h,e) = o. (From (3), T67-5a.) 

(5) 3: and ~3;, are irrelevant to + on e. (From (3), (4), T67-10d.) 

e. Let e. h not be L-false. 

(1) Rı Y R, is not null. 

(2) m: + m; > o. (From (1).) 

(3) t(B:,4,¢) < o. (From (e)(2), (b)(2).) 

(4) 3: is negative to 4 on e. (From (3), T67-10b.) 
(5) t(~ 8:,h,e) > o. (From (3), T67-5a.) 

(6) ~3;: is positive to k on e. (From (5), T67-10a.) 

We shall now see how the r-value for 7 can be determined as the sum of 
the r-values for certain 3. T7 states this is general; T8 deals with cases 
where deductive relations hold. These two theorems give the chief results 
of what we have called the first method. 


+T72-7, Let e, h, and z be any sentences in fy. 

a. Lemma. If z is not L-false, then z is L-equivalent to the disjunction 
of the 8 in R;, hence in R,, Ra, Rs, and Rs. (From T21-8c.) 

b. t(i,h,e) = Zr(3,,h,e) for all 3 in R;. (This and similar formulations 
later are always to be understood in the sense that, if the range in 
question, here §;, is null, then r(i,h,e) = o.) 

Proof. 1. Let i be not L-false. Then R; is not null. The 3 are L-exclusive in 
pairs (T21-8a). Hence the assertion by (a), T68-1c. 2. Let i be L-false. Then 
e « Ì is L-false; hence r = o (T67-6d). 
c. Every 8; in R; belongs to R, or R, or R; or Re. 
(1) If it belongs to 9%, r(3;,h,e) = o. (From T4d(3) and e(3).) 
(2) If it belongs to Ra, t(3:,4,e) < o. (From T5d(3) and e(3).) 
(3) If it belongs to R; or Rs, r(3;,h,e) = o. (From T3c.) 

d. r(i,h,e) = Zr(3:,h,e) for all 3 in R, and R, (hence in R(e.%)). 
(From (b), (c)(3).) 

On the basis of the preceding theorems we can now determine the sign 
and value of r for ¢ (to 4 on e) and for those 3 which influence the rele- 
vance of 7 for all cases. The results are shown in the following table T8. 
The possible cases are characterized by indicating (in columns (1) to (4)) 
which of the values m,, ma, m,, and m, are o and which >o; this means 
here in Qy the same as stating which of the four sentences k., ka, ka, and ka 
are L-false and which not. [The cases are in general listed in an order 
which is, so to speak, lexicographical with respect to the first four col- 
umns; only for Nos. 1o and 11 the inverse order is used because this simpli- 
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fies the later columns.] It will be seen that in all cases except the last 
(No. 16), these indications concerning the m-values determine uniquely 
all indications in the other columns, among them the sign and value of 
r for 7 and for the 3 in R, and N, and for the negations of the 3 in R, 
and R, (which will be used in the next section), and hence the relevance of 
all these sentences. No. 16, however, must be subdivided into three cases 
(A), (B), and (C), according to whether m, X m, is greater, less, or equal 
to ma X mj. The last column (14) will be explained later (§§ 74-76). 

The table T8 can be used in the following way. Suppose a statement is 
given saying that certain deductive relations hold between certain unspec- 
ified sentences e, 7, and k, and that other deductive relations do not hold be- 
tween them. Then this statement says, in other terms, that some of the 
sentences k,, k.,..., ks are L-false and that some others are not; for 
still others it may be left open whether they are L-false or not. We need 
pay attention only to what is said about k,, . . . , k, because the status 
of the other four &-sentences does not influence the r of i. k„ is L-false if 
and only if m, = o. Thus, on the basis of the given information concerning 
deductive relations, we can apply suitable lines of the table to the case 
in question, and thereby obtain results concerning the relevance of i and 
the r-value of z in terms of the m-values. More important, the table, since 
it gives an exhaustive list of all possible cases, serves to establish general 
theorems, e.g., To and some theorems in the following sections. 

The following theorem can simply be read off from the table T8. It con- 
cerns the two parts of R(e . i), viz., R: and 9., whose 3 are used in the 
first method. 


172-9. Let e, h, and i be any sentences in fy. 

a. If both R, and R, are nonnull, then for every 3 in R, r > o (to kon 
e), and for every 8 in Ra r < o. 

b. If for any (and hence for every) 3 in R, r = o toh on e, then Ra 
is null. 

c. If for any (and hence for every) 3 in R, t = o to h one, then Rı 
is null. 
(From T8, columns (10) and (11).) 


§ 73. Second Method: Conjunctive Analysis 


The second method for the relevance analysis of a sentence i in ly consists 
in analyzing 7 into its weakest conjunctive components. These are the nega- 
tions of the 3 in R(~ i); we call these negations the content-elements of i, and 
their class the content of i (D1). (It is explained, incidentally, that an alterna- 
tive to the method which we have used earlier for the construction of deductive 
logic is possible; while our earlier method based deductive logic on the concept 
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of range, the alternative method would use the concept of content instead.) It 
is found here, in analogy to the results of the first method, that the t-value for i 
(to h on e) is the sum of the values for the content-elements of i (T3b). Since 
for the negation of any 8 outside of Re r = o, the r for 7 is the sum of the r- 
values for the negations of the 3 in R(e « ~i) (T3d). This range consists of two 
parts, R(e . ~i « h) and R(e . ~i . ~h), which we call R, and Ra respectively. 
For the negation of a 3 in R, r So, in R, t Z o. The signs of r for these 
content-elements of i for all possible cases of deductive relations between ż and h 
on the evidence e are again listed in the earlier table (T'72-8, columns (12) 


and (13)). 

The first method of relevance analysis consisted in analyzing ż into its 
ultimate disjunctive components, which are the 8 in R;. Since these 
components are L-exclusive in pairs, the t-value of i (to k on e) is the sum 
of the values for the components. 

Now we shall develop a second method for the same purpose of splitting 
up 7 into smallest parts in such a manner that the r-value for 7 is the sum 
of the values for the parts. Here, we represent 7 not as a disjunction but as 
a conjunction. The conjunctive analysis comes to an end when we come 
down to the weakest factual sentences in ly. These are the negations of 
the state-descriptions in ty. Let 7 be any non-L-true sentence. Let the 3 
in which does not hold, in other words, the 3 in R(~ 4) be 81, 82---5 
3n. Then i is L-equivalent to ~3:+~32+---+ ~3n. [This is T21-8f. 
It is easily seen as follows. ~i is L-equivalent to the disjunction of the 8 
in its range, hence to 8: V.3:V...V 8x. Therefore, $ is L-equivalent to 
the negation of this disjunction, hence to the conjunction with negated 
components. This result is also immediately plausible, because what 7 says 
is this: the universe is not in the state described by 3:, nor in that de- 
scribed by 32, nor... , nor in that described by 3n.] The range of ~3: 
is the class of all 3 distinct from 3;; thus it is a largest nonuniversal range. 
Therefore ~3, is a weakest sentence which is not yet L-true but still 
factual. In other words, ~3; cannot be divided into different (i.e., non-L- 
. equivalent) factual conjunctive components. If 8; and 3; are any two 
distinct 3, then 3; . 3; is L-false; hence | ~ 3;V ~ 3;. Thus any two 
of the sentences ~3:,- - -, ~Sn are L-disjunct. Therefore the simpler 
additivity theorem of r for conjunctions (T68-2c) can be applied to 
the conjunction of these sentences: x(i,h,e) = t(~ Boh) +... ae 
t(~ Bn h,e). ; 

This situation makes it appear convenient to introduce a term for the 
class {\3:, ~82,--+, ~8n}, that is, the class of the weakest con- 
junctive components into which ¢ can be analyzed. We shall call it the 
L-content or, briefly, the content of i, and its elements the content-ele- 


ments of i. 
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+D73-1, Let j be any sentence in fy. 

a. lis a content-element (in ly) =p, l is the negation of a 3 (in 8y). 

b. Lisa content-element of j (in Qy) =p; lis the negation of a 3 in which j 
does not hold (in other words, the negation of a 3 in R(~ j)). 

c. The content of j (in Qy) = ns the class of the content-elements of j. 


The following theorem follows simply from these definitions. It states 
the relations between content and the L-concepts. 

+T73-1. Let j and k be any sentences in fy. 

a. The content of j is universal, that is, it contains all content-elements 
in y, if and only if R; is null, hence if and only if j is L-false. 

b. The content of 7 is null if and only if R; is universal, hence if and 
only if 7 is L-true. 

c. The content of j is included in (or, a subclass of) the content of k 
if and only if Ry is included in R;, hence if and only if k L-implies j. 

d. j and k have the same content if and only if they have the same 
range, hence if and only if they are L-equivalent. 


We see from this theorem that the class we call ‘content of j’ is the 
more comprehensive the stronger j is, or the more is asserted by j. Thus 
this class serves well to represent the assertive strength of j. This becomes 
still more clear when we remember that the assertive power of a sentence 
consists in its excluding certain possible cases; this has been pointed out 
by Karl Popper ([Logik], p. 67). Therefore the use of the term ‘content’ 
seems justified. 

Remark for readers of [Semantics]. The definition for ‘content’ given here in 
Dc differs from the various tentative definitions for ‘L-content’ in [Semantics] 
(D23-B1, D23-Fr, and D23-Gr), which were meant as explications for the 
same explicandum. However, the present definition fulfils the earlier two Postu- 
lates for L-content ([Semantics] P23-1 and 2), which, if restricted to sentences, 
state that the L-content of j is included in that of i if and only if | i D j; this 
is the present theorem Tıc. Therefore, the new concept fulfils likewise the 
theorems which were based on those two postulates, among them especially the 
following two: (1) the L-content of j is included in that of i if and only if the 
L-range of 7 is included in that of j ([Semantics] T23-20; now Tıc); (2) the L- 
contents of i andj are identical if and only if their ranges are identical ([Seman- 
tics] T 23-21; now Tid). 

My previous reference to Wittgenstein in this context ([Semantics], p. 151) 
was due to an error of memory; it should have been instead to Popper, as above. 


Seeing the connection, expressed in Tr, between content and L-con- 
cepts, we can easily imagine a method for the construction of deductive 
logic alternative to the method previously used (in chap. iii). While the 
earlier method used the concept of range, the alternative would use the 
concept of content. Instead of beginning with state-descriptions, this 


2 


— 
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method would begin with the content-elements. (Instead of giving them 
the form of negations of conjunctions of basic sentences, as in Dra, the 
simpler form of disjunctions of basic sentences might then be taken.) In- 
stead of the rules of ranges (D18-4), which determine for every sentence 
in which of the 3 it holds, we should here lay down analogous rules of 
contents determining for every sentence its content-elements. The con- 
tent of a sentence would then be defined as the class of its content- 
elements. Then the definitions for the L-concepts would be laid down in 
terms of contents, in analogy to their definitions in terms of ranges in the 
earlier method (D20-1); the sufficient and necessary conditions for some 
L-concepts in terms of contents which we have stated in Tr would be 
taken as the defining conditions in this alternative method. 

It would, of course, be possible to use both the concepts of range and 
of content in the system of deductive logic. In order not to complicate 
our construction of the system (in chap. iii) too much, we have decided 
to use only one of the two concepts. If it were only a question of construct- 
ing deductive logic, I think there would not be any difference between the 
two procedures from the point of view of simplicity and convenience. 
However, for inductive logic, the theory of degree of confirmation, it 
seems to me to be more convenient to take the concept of range as funda- 
mental. The m- and c-functions are, as we have seen, additive with respect 
to disjunction but not with respect to conjunction (see the explanations 
in § 68). Therefore the concept of range, which is based on disjunctive 
analysis, is more suitable for the theory of those functions than the con- 
cept of content, which is based on conjunctive analysis. This is the reason 
why we have chosen the concept of range and not that of content also 
for our system of deductive logic, since this system was to be used as a 
foundation for inductive logic. 

In the present special branch of inductive logic, the theory of the rele- 
vance measure r, the situation is again different. Since r is additive with 
respect to both conjunction and disjunction (under certain conditions), 
the concept of content is here just as useful as the concept of range, and in 
the second method of relevance analysis it is even the more useful of the 
two concepts since this method is based on conjunctive analysis. That is 
the reason why we have introduced here the concept of content. However, 
we shall use this concept here chiefly in an auxiliary function, for the pur- 
pose of facilitating the understanding of the general nature of the second 
method. For the technical work, we shall still have to use the concept of 
range too, because the whole of the preceding theory, which we have to 
use for the proofs of our theorems here, is framed in terms of ranges. 
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The result of our previous discussion of the second method (in the be- 
ginning of this section) can now be formulated in terms of content as 
follows: the v-value for i (to h on e) is the sum of the values for the content- 
elements of i. 

In the preceding section we have determined the r-values (always to a 
given k on e) for all content-elements in fy, that is, for all sentences of 
the form ~3,, where 3; belongs to any of the ranges R, to Rs (for Rs, Re, 
R, and Rs, this was done in T72-3d; for R: and R, in T72-4c, d(4), 
and e(s5); for R- and R, in T72-5c, d(4), and e(5)). The range of ~i con- 
sists of R, 9,, R, and Rs. Therefore, the content of 7 consists of the nega- 
tions of the 3 in R, N,, Nz, and Ks. The r-value of i is the sum of the 
values for these negations. However, we may omit here R, and Rs, be- 
cause we have found that, for any 3; in these ranges, r(~ 3:,h,e) = o 
(T72-3d); thus it is sufficient to consider the 3 in R, and R,. In other 
words, in studying R(~i), we may omit its part R(~e . ~i); we need 
only consider the remainder, that is R(e . ~i). 

In accordance with the basic ideas of the second method so far ex- 
plained informally, we shall now develop this method technically in the 
following theorems. T3 is analogous to T72-7; it is the fundamental 
theorem of the second method. 


+T73-3. Let e, k, and 7 be any sentences in ly. 

a. Lemma. If zis not L-true, then 7 is L-equivalent to the conjunction 
of the content-elements of 7, i.e., the conjunction of the negations 
of the 3 in R(~i), hence in Rz, Ry, Ry and Rs. (From T21-8f.) 

b, r(i,h,e) = Zx(~3;,h,e) for all content-elements of i, that is, for the 
negations of all 3 in #(~z). (If R(~z) is null, r = o.) 

Proof. 1. Let R(~ i) be not null. Negations of distinct 3 are L-disjunct 


(T21-8g). Hence the assertion by (a), T68-2c. 2. Let R(~ i) be null. Then 
m(é» ~i) = o; hence r = o (T67-6e). 


c. Every 3; in R(~i) belongs to R, or R, or R, or Rs. 

(1) If 3: belongs to R;, 1(~3.,h,e) < o. (From T72-4d(4) and e(5).) 
(2) If 3: belongs to Ry, t(~3.,/,e) = o. (From T72-5d(4) and e(5).) 
(3) If 3: belongs to R, or Rs, t(~3:,h,e) = o. (From T72-3d.) 

d. r(i,h,e) = Er(~83ah,e) for the negations of all 3 in R, and R 

(hence in R(e . ~i)). (From (b), (c)(3).) 

We have earlier stated the signs of t for the content-elements of 7 men- 
tioned in T3d, viz., the negations of the 3 in R, and R, for all the possible 
cases of deductive relations between i and 4 with respect to e (see col- 
umns (12) and (13) of table T72-8). The following theorem follows from 
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these columns; it is the analogue to T72-9. More important consequences 
will be drawn from these columns later (§ 75). 


173-4, Let e, k, and 7 be any sentences in fy. 

a. If both R, and R, are nonnull, then for the negation of every 3 in 
R, r >o (tok one); and for the negation of every 3 in R, r < o. 

b. If for the negation of any (and hence of every) 3 in R, r = o (to 
h on e), then K, is null. 

c. If for the negation of any (and hence of every) 3 in R, r = o (toh 
on e), then §; is null. (From T72-8, columns (12) and (13).) 


On the basis of the second method, the r-value of i (to h on e) and the 
relevance of i are, of course, the same as before, as stated in columns (8) 
and (9) of the table T72-8; because the second method is merely a differ- 
ent way to the same aim. What is different here is only the relevance- 
elements, so to speak, out of which the relevance of 7 is composed. In the 
first method we split up the r of z into the values for the 3 in the range of å, 
and, more particularly, in R, and NR, as stated in columns (10) and (11). 
Here, in the second method, we split up the same r-value of z in a different 
manner, into the values for the content-elements of 7, and, more particu- 
larly, for the negations of the 3 in R, and R, as stated in columns (12) 
and (13). 


§ 74. Extreme Relevance 


We call i extremely positive (to # on e) if i is positive and no stronger sen- 
tence j (i.e., such that }¢ «7 D i) is negative (Dra). In this case, in y, by the 
addition of i to e, the c of / is increased to the maximum value 1 (Tri). Extreme 
negative relevance is defined analogously (D1b); here, c is diminished to the 
minimum value o (T2i). i is called extremely irrelevant if 7 is irrelevant and 
no stronger sentence is relevant (Drd). This concept occurs only in certain 
trivial cases (T3d). These concepts are here studied with the help of the first 
method (§ 72), the analysis of i as disjunction of the 3 in its range. 


In this and the next sections we shall discuss some special cases of rele- 
vance. If 7 is positive to h on e, then the situation is in general such that 
among the disjunctive parts into which we divide 7 by what we have called 
the first method some are positive and others are negative and maybe 
still others irrelevant. However, it may occur that none of the parts is 
negative. This case is of special interest, and we shall introduce a new con- 
cept for it. Likewise, it may occur that none of the conjunctive parts into 
which we analyze 7 by the second method is negative; this leads to another 
new concept which will be introduced in the next section. Both concepts 
may hold simultaneously, but this is not always the case. 
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We begin with the first method as explained in § 72. Here i is analyzed 
into its ultimate disjunctive parts, namely, the 3 in its range. As we have 
seen (T72-7), R; consists of R,, Ra, Rs, and Rs; however, for every 3 in 
the latter two ranges r = o, while for those in R, r 2 o, and for those in 
Ra r So. The special case in which we are now interested is the case 
where for i r > o and for no disjunctive part of it r < o; this means 
that for at least one, and hence for every, 3 in R, r > oand for no Zin R, 
r < o. The table T72-8 will help us to study this and similar other cases; 
we see easily from the columns (10) and (11) that the situation just de- 
scribed holds in the cases Nos. 11 and 12 and in no others. If the condi- 
tions described are fulfilled, we shall say that 7 is extremely positively 
relevant or, briefly, extremely positive to h on e. We shall however not 
use for the definition the condition as just formulated because this anal- 
ysis of relevance in terms of 3 applies only to y. In order to make the 
definition applicable to all finite and infinite systems, we have to refer not 
to the 3 but generally to any disjunctive part of z, in other words, to any 
sentence j L-implying i (if 7 is L-equivalent to j V k, then j L-implies 7). 
We shall even go one step further and require that any sentence j which 
either alone or together with e L-implies 7 is not negative (Dra); in fact, 
it makes no difference whether we add “either alone or together with e” 
or not. The term ‘extremely negative’ will then be defined in an analo- 
gous way (Drb). We shall see that at least in Qy the following holds: 7 is 
extremely positive to / on e if and only if the addition of i to e increases 
the c of 4 to the maximum value 1; and 7 is extremely negative to / on e if 
and only if the addition of z to e decreases the c of 4 to ‘the minimum 
value o. This is the reason for the choice of the term ‘extremely’. Finally 
we shall say that 7 is extremely irrelevant to h on e if i is irrelevant and 
no disjunctive part of it or, more exactly, no sentence which either alone 
or together with e L-implies 7, is relevant (Dic). (In this case the term 
‘extremely’ is used only for the sake of analogy with the two other con- 
cepts.) This concept of extreme irrelevance is not very important because 
it holds only in certain trivial cases. 


D74-1, Let e, h, and 7 be any sentences in a finite or infinite system &. 
Let c be a regular c-function in 2; the relevance concepts (‘positive’, etc.) 
are meant with respect to c in Q. 
+a. 7 is extremely positive to h on e (with respect to c in 2) =pr (1) i 

is positive to / on e, and (2) for every sentence j in £, if }e.j D i 
then j is not negative to h on e. 
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+b. i is extremely negative to h on e (with respect to cin 2) = ps (1) zis 
negative to h on e, and (2) for every j, if e „j D 7 then j is not posi- 
tive to k on e. 

c. i is extremely relevant to h on e (with respect to c in £) =pr7 is 
either extremely positive or extremely negative to k on e. 

d. iis extremely irrelevant to h on e (with respect to cin £) = ps (1) dis 
irrelevant to h on e, and (2) for every j, if e. j D i then j is not 
relevant to k on e. 

One might perhaps think of replacing the condition Dra(z) by a stronger 
one requiring that j be positive. This would lead, however, to undesirable con- 
sequences; see the later remark concerning an analogous change in D75-1. 


For the sake of simplicity, the following theorems refer again to finite 
systems £y only. Here the relations stated hold without exceptions; and 
they can easily be proved with the help of the theorems of § 72 concern- 
ing 3, and especially the table T72-8. The theorems of the present section, 
as far as they do not refer to 3, hold likewise for nongeneral sentences 
in Qo; but for general sentences in lo they hold only under certain re- 
stricting conditions. y 

Tx gives various conditions which are logically equivalent to extreme 
positive relevance; T2 does the same for extreme negative relevance, and 
T3 for extreme irrelevance. à 

T74-1, Let e, h, and 7 be sentences in ty. Each of the following condi- 
tions (a) to (i) is sufficient and necessary for i to be extremely positive 
to k on e. 
+a. iis positive to h on e, and for no Bink t<o. 


Proof. 1. Let i be extremely positive. If 3; is in R, then } 3: D i. There- 
fore 3; is not negative (Dra(2)), hence its r is not <o (T67-10b). 2. Let i 
be positive and let r for no 3 in R: be <o. Let j be any sentence such that 
tewj Di. Then R(e.j) C Ri. Hence for no 8 in R(e+j) t< o. Therefore t 
for j is not <o (172-74); hence j is not negative (T67-10b). Since this holds 


for every j i is extremely positive. 
b. Of the cases in the table T72-8 either No. 11 or No. 12 holds for e, 
h, and i. (From (a), with T72-8, columns (9), (10), and (11).) 
+c. m, = 0, m: > 0, and m, > 0. (From (b), T72-8, columns (1), (2), (4).) 
d. e.i. ~h is L-false (hence }e»i D h); e.t and e. ~h are not L-false 
(hence not pe D ~i and not e 2 h). (From (c).) 
e. m, = o, and r(i,h,e) = m: X m, (From (b), T72-8, column (8).) 
f. m, = o, and r(i,h,e) > o. (From (b), T72-8, column (8).) 
+g. dis positive, and e.t. ~h is L-false (hence e.t D h). (From (b), 
T72-8, columns (9) and (2).) 
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+h. 7 is positive, and c(h,e . i) = 1. (From (g), T59-1b, T59-5b.) 
+i. c(h,e. i) = 1, and c(h,e) < 1. (From (h), D65-1a.) 

T74-2. Let e, k, and i be sentences in £y. Each of the following condi- 
tions (a) to (i) is sufficient and necessary for i to be extremely negative 
tohone. 
+a. 7 is negative to 4 on e, and for no 8 in R; t > o. (From D1b, T67- 

toa, T'72-7d, in analogy to Tra.) 

b. Of the cases in the table T72-8 either No. 7 or No. 8 holds for e, h, 
and 7. (From (a), with T72-8, columns (9), (10), and (11).) 

+c. m, = 0, m, > 0, and m, > o. (From (b), T72-8(z), (2), (3).) 

d. ¢.i.«h is L-false (hence }e.i D ~h, i and k are L-exclusivewith 
respect to e); e.t and e . h are not L-false (hence not } e D ~i and 
not }e D ~h). (From (c).) 

e. m, = o, and r(i,h,e) = —m, X m;. (From (b), T72-8(8).) 

f. m, = o, and r(i,h,e) < o. (From (b), T72-8(8).) 

+g. iis negative, and e «i. his L-false (hence e.i D ~ Ah). (From (b), 
T72-8(9) and (1).) 

+h. 7 is negative, and c(h,e . i) = o. (From (g), T59-1e, T59-5c.) 

+i. c(h,e. i) = o, and c(h,e) > o. (From (h), D65-1b.) 

T74-3. Let e, k, and 7 be sentences in Qy. Each of the following condi- 
tions (a) to (h) is sufficient and necessary for i to be extremely irrelevant 
toh one. 
+a. 7 is irrelevant to h on e, and for every 3 in R; t = 0. 

` Proof. 1. Let i be extremely irrelevant. If 8: is in ®,, then | 3; D i. There- 
fore 3; is not relevant (Dıc(2)), hence its r is o (T67-r0c). 2. Let i be irrele- 
vant, and let r = o for every 3 in 9. Let j be any sentence such that | e sj D i. 
Then R(e aj) C Ri. Hence for every 3 in R(e.j) r = o. Therefore r for j is o 
(172-74), and hence/ is not relevant (T67-10c). Since this holds for every j, i is 
extremely irrelevant. 

b. One of the cases Nos. 1 to 6, 9, and 1o in the table T72-8 holds for e, 
h, and 4. (From (a), T72-8(9), (10), and (11).) 

c. Either all four values m,, m, m,, and m; are o, or any three of them 
are o, or m, and m, are o, or m, and m, are o, or m, and my, are o. 
(From (b), T72-8(r) to (4).) 

+d. At least one of the sentences e .7, ¢. k, and e . ~k is L-false. (From 
(b), T72-8(5), (6), (7).) 

e. 7 is irrelevant to h on e, and either m, = o or m, = o. (From (b), 
T72-8(9), (1), and (2).) 

f. Either e.t is L-false, or c(k,e) is either o or x. (From (d), T59-1b 
and e, T59-5b and c.) 
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+g. Either e.i is L-false, or c(h,e) and c(h,e . i) are both o or both 1. 
(From (d), Ts9-1b and e, T59-5b and c.) 

The results of Trb, T2b, and T3b, based on columns (9), (10), and (11) 
of Table T72-8, are now entered in column (14). 

T3d shows that extreme irrelevance is not a very useful concept since 
it holds, at least in €y, only in the following two trivial cases for 7 to h 
on e: (1) e L-implies h or ~h; (2) e «i is L-false. In the case (1), the prior 
evidence e decides already completely about the hypothesis / either 
affirmatively or negatively; in this case every sentence is irrelevant and, 
moreover, extremely irrelevant to k on e. In the case (2), the sentence z 
describes an event which on the evidence e is no longer possible; hence i 
cannot occur as additional evidence to e. Nevertheless it is interesting to 
see that the defining condition in Drd does not hold in any other cases 
but the trivial ones described in T3d. 


The following theorem shows that for any 3, relevance or irrelevance is 
always extreme. . 
74-4, Let e and h be sentences in y, and 8; be any 3 in 8y. 
a. If 3; is positive to h on e, then it is extremely positive. 
Proof. 3; is the only 3 in R(8:) (T19-6). For 8; t > o (T67-10a). Hence the 

assertion by Tra. 

b. If 3; is negative to # on e, then it is extremely negative. (From T2a, 
in analogy to (a).) 

c. If 3; is irrelevant to # on e, then it is extremely irrelevant. (From 
T3a, in analogy to (a).) 

d. 3; is either extremely positive to h on e, or extremely negative, or 
extremely irrelevant. (From T65-48, (a), (b), (c).) 

+T746. Let e, h, i, and j be sentences in &y such that }e.j D i. 

Then the following holds. 

a. If i is extremely positive to % on e and e „j is not L-false, then j is 

likewise extremely positive. 
Proof. }e«i Dh (Tid). fe«j Deni, hence fej D h. Hence the asser- 

tion with Trd. 

b. If iis extremely negative to k on e, and e. j is not L-false, then j is 
likewise extremely negative. (From Tad, in analogy to (a).) 

c. If iis extremely irrelevant to / on e, then 7 is likewise. 


Proof. 1. Let e.t be L-false. Since }e«j D e. i, e « j is also L-false. Hence 
the assertion by T3d. 2. Let é. i be not L-false. Then either e «h or €. ~h 
is L-false (T3d). Then every sentence is extremely irrelevant (T3d), hence 
also j. 


414 : VI. RELEVANCE AND IRRELEVANCE 


T6 shows that extreme positive or negative relevance or irrelevance are 
transmitted from ¢ to any stronger sentence j not excluded by e. 

The simple relevance concepts are dependent upon the particular c- 
function chosen and hence upon the underlying m-function. For instance, 
î may be positive to on e with respect to one regular c-function and 
negative with respect to another. On the other hand, the extreme rele- 
vance concepts are independent of the particular c-function chosen. For in- 
stance, if 7 is extremely positive to 4 on e with respect to any one regular 
c-function, then it is likewise with respect to every other one. This is stated 
by the following theorem. 


T74-7. Let cand c’ be any regular c-functions for £y, and h, e, andi any 
sentences in gy. 

a. If 7 is extremely positive to # on e with respect to c, then likewise 
with respect to c’. (From Tid.) 

b. If 7 is extremely negative to 4 on e with respect to c, then likewise 
with respect to c’. (From T2d.) 

c. If ¢ is extremely relevant to h on e with respect to c, then likewise 
with respect to c’. (From (a), (b).) 

d. If ¢ is extremely irrelevant to % on e with respect to c, then likewise 
with respect to c’. (From T3d.) 


§ 75. Complete Relevance 


We call i completely positive to + on e if i is positive and no weaker sentence 
(or, more exactly, no sentence L-implied by e «) is negative (Dra). This holds, 
in Qy, if i is positive and no content-element of i is negative (Tra). Thus this 
concept is the counterpart to that of extreme positive relevance; while the lat- 
ter is based on the first method, the present concept is based on the second. 
Analogous definitions are laid down for complete negative relevance and com- 
plete irrelevance. The latter concept occurs only in certain trivial cases. The 


same holds for Keynes's concept of irrelevance in the strict sense, which is 
similar to complete irrelevance. 


The concepts to be introduced here are analogous to those in the pre- 
ceding section. There we considered the special case of the positive rele- 
vance of 2, where no disjunctive part of i is negative. Here we shall con- 
sider the case where no conjunctive part of z is negative. 

Thus we apply here the second method of relevance analysis as ex- 
plained in § 73. It consists in analyzing 7 into its ultimate conjunctive 
parts, the content-elements of i. These content-elements are the nega- 
tions of the 3 in R(~2) (D73-1b), hence in R, R, R, and Rs. However, 
as we have seen (T'73-3c), for the negations of the 3 in R, and Rs r= 0, 
while for those in R, r < o, and for those in R, r = o. The special case 
to be here considered is the case where for 7 r > o and for no content- 


i 
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element of 7 r < 0; it obviously holds if and only if for the negation of 
at least one, and hence of every, 3 in R, r > oand for none in R, r < o. 
In the table T72-8 we find easily that this situation holds in the cases Nos. 
rz and 14 and in no others. If the situation described holds, we shall say 
that 7 is completely positively relevant or, briefly, completely positive 
to h on e. However, here again we shall formulate the definition not in 
terms of 3 but in a manner applicable to le too. We shall require in the 
definition that @ is positive and that none of those sentences 7 is negative 
whose content is included in that of 7 or even in that of e.i, in other 
words, those which are L-implied either by 7 alone or by 7 together with 
e (Dra). (It makes no difference in the resulting concept whether we add 
‘either alone or together with e’ or not.) The terms ‘completely negative’ 
and ‘completely irrelevant’ will be defined analogously (D1b and d). The 
latter concept is again not very important. 


D75-1, Let e, k, and i be any sentences in a finite or infinite system £. 
Let c be a regular c-function in 2; the relevance concepts (‘positive’, etc.) 
are meant with respect to c in £. 
+a. i is completely positive to hon e (with respect toc in £) =p: (i)i is 

positive to k on e, and (2) for every sentence j in |Ì, if e.i D j 
then j is not negative to / on e. 
+b. iis completely negative to h on e (with respect toc in £) =p, (1) zis 

negative to 4 on e, and (2) for every j, if e.t D j then 7 is not posi- 
tive to h on e. 

c. i is completely relevant to h on e (with respect to c in £) =pr7 is 
either completely positive or completely negative to hon e. 

d. i is completely irrelevant to h on e =p: (1) 2 is irrelevant to h on e, 
and (2) for every j, if pe „i D j then f is not relevant to h on e. 


One might perhaps consider an alternative definition D’ formed from Dra 
by replacing the condition (2) by the stronger condition that every sentence L- 
implied by e «i be positive. This change, however, would not lead to a suitable 
concept. It is certainly convenient to define all concepts of relevance or irrele- 
vance in such a manner that they make no difference between two sentences i 
and i’ which are L-equivalent with respect to e. This requirement is fulfilled 
by all concepts defined in this chapter, but it would not be fulfilled by D’. In 
order to show this, let i be completely positive to # on e in the stronger sense 
of D’, and let e be factual. Now we take as 7’ the conjunction 7» ~3i, where 
3: is any 3 in R(~e). (There must be such 8 because e is not L-true.) Then it 
can be shown that (1) i and 7’ are L-equivalent with respect to e, and (2) never- 
theless, i’ does not satisfy the definition D’. [Proof. 1. t Bs D~ (T20-2t), 
hence } e D ~3;. Therefore e «7 L-implies ~3;, and also i, hence their con- 
junction i’, On the other hand, i’ and hence e « i L-implies 7. Hence the asser- 
tion (1). 2. ~; is L-implied by #’ and hence by est. Nevertheless, ~g; is 
not positive to h on e but irrelevant (T72-3e). Hence the assertion (2).] 
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The following theorems are restricted to £y for the same reason as those 
in the preceding section. Tr gives various conditions which are logically 
equivalent to complete positive relevance; T2 does the same for com- 
plete negative relevance, and T3 for complete irrelevance. 


T75-1. Let e, h, and i be sentences in Qy. Each of the following condi- 
tions (a) to (h) is sufficient and necessary for i to be completely positive 
toh one. i 
+a. iis positive to k on e, and for no content-element of i r < o. 


Proof. 1. Let i be completely positive. Let ~3; be a content-element of i, 
in other words, let 8; be in R(~ i). Then} 3; D ~i, hence }i D ~ 3;. There- 
fore ~3; is not negative (D1a(2)), hence its r is not <o (T67-10b). 2. Let i be 
positive, and let r for no content-element of i be <o. Let j be any sentence such 
that Fesi Dj. Then | ~j D ~ (e.i); and hence R(~j) C R(~(e .i)). If 
j is L-true, its r is o and hence j is not negative. Now suppose that j is not L- 
true; then ~ is not L-false, and hence R(~j) is not null. Let ~; be a con- 
tent-element of j; in other words, let 8; be in R(~ j). Then 3; is in R(~(e «i)), 
hence in one of the ranges R, Ra, .. . , Ms. If 3; is in one of the ranges Rs, .. . , 
Rs, then for ~Z; r = o (£72-3d). If 3; is in R, or in R, then it is in R(~ i); 
therefore ~3; is a content-element of i and hence, according to our assump- 
tion, its r is not <o, Thus for no content-element of j t> o. Therefore r for . 
Jis not <o (T73-3b); hence j is not negative (T67-r0b). Since this holds for 
every j, 7 is completely positive. 

b. Of the cases in the table T72-8 either No. 11 or No. 14 holds for e, h, 

and i. (From (a), T72-8(9), (12), and (13).) 
+e. m, = o, m, > o, and m, >, 0. (From (b), T72-8(1), (3), (4)-) 
d. ¢.~i.h is L-false (hence }e.h D i); e. ~i and e.h are not 
L-false (hence not | e D i and not }e D ~h). (From (c).) 

e. m, = o, and r(ż,h,e) = m, X m,. (From (b), T72-8(8).) 

f. m, = o, and r(i,h,e) > o. (From (b), T72-8(8).) 
+g. 7 is positive, and e. ~i.h is L-false (hence } e.h D i). (From (b), 

T72-8(9) and (3).) 
+h. tis positive, and c(ż,e . 4) = 1. (From (g), Ts9-1b, T59-5b.) 

The condition which (g) and (h) add to the positive relevance of i may 
be formulated as follows in terms previously used (§ 61): the likelihood 
of 7 is 1, hence 7 is predictable by h on e. 

T75-2. Let e, h, and i be sentences in Qy. Each of the following condi- 
tions (a) to (g) is sufficient and necessary for i to be completely negative 
to kon e. 
+a. iis negative to k on e, and for no content-element of į t > o. (Analo- 

gous to Tra.) 

b. Of the cases in the table T72-8 either No. 7 or No. 15 holds for e, h, 

and 7. (From (a), T72-8(9), (12), and (13).) 
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+c. m, = 0; m, > o, and m, > o. (From (b), T72-8 (2), (3), (4).) 

d. e. ~i. ~h is L-false (hence Fe. ~h D i; pe D iV h; i and h are 
L-disjunct with respect to e); e. ~i and é. ~h are not L-false 
(hence not e D i and not fe D k). (From (c).) 

e. m, = o, and r(i,h,e) = —m X m. (From (b), T72-8(8).) 

f. m, = o, and r(i,h,e) < o. (From (b), T72-8(8).) 

+g. i is negative, and e. m~i. ~h is L-false. (From (b), T72-8(9) and 
(4).) 

T75-3. Let e, k, and i be sentences in ly. Each of the following condi- 
tions (a) to (£) is sufficient and necessary for å to be completely irrelevant 
to h on e. 
+a. iis irrelevant to h on e, and for every content-element of 7 r = o. 

Proof. 1. Let i be completely irrelevant. Let ~3; be a content-element of i. 
Then Fi D ~ 3;. Therefore ~3; is not relevant (D1c(2)); hence its r is o 
(T67-10c). 2. Let i be irrelevant, and let r = o for every content-element of i. 
Let j be any sentence such that }e»# Dj. Then t for j is o (in analogy to the 
proof of T1a(2)). Hence j is not relevant (167-10c). Since this holds for every 
j, i is completely irrelevant. 

b. One of the cases Nos. 1, 2, 3, 5, 6, 9, 10, and 13 in the table T72-8 
holds for e, k, and i. (From (a), T72-8(9), (12), and (13).) 

c. Either all four values m, Ma, m,, and m, are o, or any three of them 
are o, or m, and m; are o, or m+ and m, are o, or m and m, are o. 
(From (b), T72-8(1) to (4).) 

+d. At least one of the sentences € . ~i, € « h, and e . ~h is L-false. 

Proof. ¢« ~i is L-false if and only if m; = m, = o (T65-1b). This is the 
case in Nos. 1, 5, 9, and 13 in T72-8. € » h Or e a ~hor both are L-false in Nos. 1, 
2, 3, 5, 6, 9, and 10 (T72-8(6) and (7)). Hence the assertion by (b). 

e. 7 is irrelevant to h on e, and either m, = o or m, = o. (From (b), 
T72-8(9), (3), and (4).) 

f. Either e.~i is L-false (hence e D 4), or c(h,e) is either o or 1. 
(From (d), T59-1b and e, T59-5b and c.) 

The results of Trb, T2b, and T3b, based on columns (9), (12), and 

(13) of Table T72-8, are listed in column (14). 

The following theorem shows that for any content-element, relevance 
and irrelevance is always complete. 

T75-4. Let e and k be sentences in ty, and 3; be any 3 in fy. 

a. If ~3; is positive to k on e, then it is completely positive. 


Proof. The content-elements of ~; are the negations of the 3 in R(~~8)) 
(D73-1b), that is, #(B:). Bi is the only 3 in ®(8:) (T19-6). Hence the only con- 
tent-element of ~3; is ~3; itself. For ~3; t > o (T67-10a). Hence the as- 
sertion by Tia. 
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b. If ~8, is negative to # on e, then it is completely negative. (From 
Taa, in analogy to (a).) 

c. If ~8;is irrelevant to 4 on e, then it is completely irrelevant. (From 
T3a, in analogy to (a).) 

d. ~ 8, is either completely positive to + on e, or completely negative, 
or completely irrelevant. (From T65-4g, (a), (b), (c).) 


+T75-6, Let e, k, i, and j be sentences in y such that }e.i D j. 
Then the following holds. 
a. If ¢ is completely positive to 4 on e, and not } e D j, then j is like- 
wise completely positive. 
Proof. pesh Di (T1d). Hence Fesh Desi hence fesh Dj. Hence the 
assertion with Tid. 
b. If i is completely negative to 4 on e and not țe D j, then j is like- 
wise completely negative. (From Tad, in analogy to (a).) 
c. If i is completely irrelevant to + on e, then j is likewise. 
Proof. 1. Let ¢. ~i be L-false. Since Fesi Dj, Hes aj D ~i (Ta1-sh(6)), 
hence fe. ~j D e. m~i; therefore e. ~j is L-false too. Hence the assertion 


by T3d. 2. Let e. ~i be not L-false. Then either z. %4 or e. ~h is L-false 
(T3d). Therefore every sentence is completely irrelevant (T3d), hence also j. 


Tó shows that complete positive or negative relevance and complete 
irrelevance are transmitted from i to any weaker sentence j not L-im- 
plied by e. 

The complete relevance concepts are, like the extreme relevance con- 
cepts (74-7), independent of the particular c-function chosen. This is 
stated by the following theorem. $ 


T76-7. Let ¢ and ¢’ be any regular ¢-functions for 2y, and h, e, and į 
any sentences in fy. 

a. If 4 is completely positive to # on e with respect to ¢, then likewise 
with respect to ¢’, (From Tid.) 

b. If ¢ is completely negative to 4 on e with respect to c, then likewise 
with respect to ¢’. (From Tad.) 

c. If į is completely relevant to A on e with respect to c, then likewise 
with respect to ¢’. (From (a), (b).) 

d. If į is completely irrelevant to 4 on e with respect to ¢, then likewise 
with respect to ¢', (From T3d.) 


Complete irrelevance is, like extreme irrelevance, not a very useful con- 
cept. At least in Qy, it occurs only in the following two trivial cases, as 
we see from T3d: (1) e L-implies A or ~h; (2) e L-implies i. In the case (1), 
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the prior evidence e decides already entirely about the hypothesis + by 
either L-implying or excluding it; thus no additional evidence can have any 
relevance; in this case every sentence is irrelevant to 4 on ¢ and, moro- 
over, extremely and completely irrelevant. In the case (2), $ does not add 
any new evidence to e. Nevertheless, in analogy to the situation with 
extreme irrelevance, it is of interest to notice that the defining condition 
in Did does only hold in the trivial cases just described. 

Keynes has given definitions for the concepts of irrelevance and relo- 
vance which we have adopted (except for the somewhat wider sense which 
we have given to ‘irrelevance’ in D6s-1d), He believes, however, that 
another, stronger concept of irrelevance would be theoretically preferable 
([Probab.], p. 55). His definition for it, expressed in our terminology, Is 
as follows: é is irrelevant in the strict sense to on evidence ¢ = p; there 
is no j such that ¢(jje+#) = 1, e(je) » 1, and c(h,e »j) # clhe). For the 
sake of simplicity, let us restrict the following discussion to a finite system 


implicitly contained in the statement of a probability value with the 
evidence e i.) Keynes points out, correctly, that “it would sometimes 
occur that a part of evidence would be relevant, which taken as a whole 
was irrclevant" (p. 55). He believes that “we must regard evidence as 
relevant, part of which is favourable [i.e., positive) and part unfavourable 
lic., negative], even if, taken as a whole, it leaves the probability un- 
changed” (p. 72). These considerations may seem quite plausible at first. 
As an example, let us take a case of the singular predictive inference. Let A 
be ‘Pc’, where ‘P’ is a primitive predicate. Let e describe a sample of + = 
an individuals, to which c does not belong, to the effect that w of these 
individuals are P and the other n are not-P, Then for many c-functions 
(among them our function ¢* to be introduced later) elha) = 1/2, in- 
dependently of s. Let i be ‘Pa. ~P8', where ‘a’ and ‘b do not occur in e 
Then the sample described in ¢ « è does again contain equal numbers of P 
and not-P; thus again c(h. 1) = 1/2 Therefore é is irrelevant (in the 
simple sense) to + on e. Let j, be ‘Pa’, ja '~Pb'; hence d is j, + jy, Now in 
ë «jı the number of P is larger than that of not-P, while in e. j, the in- 
verse holds. Therefore, for many efunctions (among them c* and pre- 
sumably all adequate explicata) the following is the case: (he «j.) > t/a, 
che .j,) < 1/2. Thus j, is positive to Æ on ¢, and j, negative. d is irrele- 
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vant as a whole but consists of positive and negative parts. Keynes pre- 
sumably intended to exclude cases of this kind when he suggested the 
stronger concept of irrelevance. In the example just given, 7 is irrelevant 
to h on e in the simple sense but not in the strict sense. The problem is 
whether under normal conditions any cases can be found where the latter 
concept applies. Keynes himself has not given any example for his con- 
cept. If we analyze his concept for a finite system £y with the help of our 
second method, we find that it is essentially the same as our concept of 
complete irrelevance with the condition added that e . i not be L-false. 


Proof. Condition (2) in the above formulation of Keynes’s definition can be 
transformed as follows. It says that no j such that } e «i Dj but not Fe Dj 
is relevant to h on e, e.i D7 if and only if } ~j D ~ (e.i), hence if and 
only if R(~ j) is included in R(~ (e «7)), hence in the class-sum of Ry, Hy, .. «5 
Rs. Fe Dj if and only if} ~j D ~e, hence if and only if R(~ j) is included 
in R(~ e), hence in the class-sum of Rs, . . . , Its. Thus (2) means the follow- 
ing: no j is relevant if it is such that every content-element of it is the negation 
of a 3 in one of the ranges ®;, .. . , Rs, and at least one is the negation of a 3 
in R; or Ry. In other words, (2) means that no negation of any 3 in one of the 
ranges R, . . . , Rs is relevant. (This shows, incidentally, that the condition 
‘not | e D7’ in Keynes’s definition can be omitted without changing the re- 
sult.) Thus (2) means that no content-element of i is relevant. And this is in- 
deed what Keynes intended; because he required that no part of i be relevant, 
and by ‘part’ he meant content-part, i.e., conjunctive part. Now we see that 
this is the same as complete irrelevance of i (T3a). Thus, å is irrelevant to h on e 
(in &y) in Keynes's strict sense if and only if (1) e «i is not L-false, and (2) i is 
completely irrelevant to 4 on e. Among the cases in the table T72-8, (2) holds 
in Nos. 1, 2, 3 5, 6, 9, 10, and 13 (T3b); (1) holds in Nos. 5 to 16 (T72-8(s)); 
thus Keynes’s concept holds in Nos. 5, 6, 9, 10, and 13. 


We see that 7 is irrelevant in the strict sense to % on e (in Qy) if and only 
if (1) e.t is not L-false, and (2) at least one of the sentences e. ~i, 
€ « h, and e « ~h is L-false (T3d). This shows that for any e and h (in ty) 
such that neither % nor ~} follows from e, there cannot be any sentence i 
which says anything new in comparison with e (i.e., which is not L-implied 
by e) and which is irrelevant to / on e in the strict sense as defined by 
Keynes. [Suppose that not } ¢ D i. Then e. ~i is not L-false, and hence 
its range is not null. If 3; is any 3 in this range, hence in R, or Ry, then 
~8: is a counterexample to Keynes’s definition; it is L-implied by e.t 
but not by e, and it is relevant to k on e (T72-4e(6), T72-5e(6)).] In other 
words, under ordinary conditions there are always content-elements of i 
which are relevant to + on e. Therefore, the concept of irrelevance in the 
strict sense, like that of complete irrelevance, is not preferable to the 
simple concept of irrelevance; it cannot take its place but is useful, if at 
all, only as a special or, so to speak, degenerate case of the latter concept. 
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§ 76. Relations between Extreme and Complete Relevance 


Theorems are stated which deal with extreme and complete relevance and 
irrelevance. Four kinds of positive relevance are distinguished (neither extreme 
nor complete, extreme but not complete, complete but not extreme, extreme 
and complete); and for each of them a sufficient and necessary condition is 
given (Tr). The same is done for negative relevance (T2) and for irrelevance 
(T3). Finally it is examined how each of the concepts of extreme or complete, 
positive or negative, relevance or irrelevance is transformed by negating i or h 
or both or by exchanging i and h (T5). Thus it is found, for example, that the 
following conditions are logically equivalent (Tsa): (1) i is extremely positive 
to k on e; (2) ~i is completely negative to k on e; (3) 7 is extremely negative 
to ~h on e; (4) k is completely positive to i on e. 


The theorems of this section make use of the concepts of extreme rele- 
vance and of complete relevance. Tı deals with the four possible cases of 
positive relevance for a finite system £y; T2 does the same for negative 
relevance, and T3 for irrelevance. The results in these theorems are easily 
found with the help of the table T72-8, which gives a complete survey of 
all possible cases. 


T76-1, Let i be positive to h on e in Qy. Then the following holds. (It is 
easily seen that for any given sentences e, k, and 7 exactly one of the four 
conditions is fulfilled.) 


a. 


= 


Q 


i is neither extremely nor completely positive if and only if the case 
No. 16A in the table T72-8 holds; hence if and only if kı, ka, ky and 
k, are not L-false (hence there are no deductive relations between 7 
and A with respect to e) and m, X m, > m, X my (where all four 
m-values are >0). 

iis extremely but not completely positive if and only if No. 12 holds; 
hence if and only if k, is L-false (hence | e.i D h) but k, ky, and k, 
are not. 

iis completely but not extremely positive if and only if No. 14 holds; 
hence if and only if k, is L-false (hence }e. D i) but k., ka, and k, 
are not. 

i is extremely and completely positive if and only if No. rr holds; 
hence if and only if kı and k, are L-false (in other words, 
te D (i = h), i and k are L-equivalent with respect to e) but k, 
and k, are not L-false (in other words, ¢ and %4 are neither L-exclu- 
sive nor L-disjunct with respect to e; it follows from this that e 
L-implies none of the sentences i, ~i, h, and ~h). 

(From T72-8.) 
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T76-2. Let 7 be negative to h on e in Qy. Then the following holds. (It is 
easily seen that for any e, k, and i exactly one of the four conditions is 
fulfilled.) 

a. zis neither extremely nor completely negative if and only if the case 
No. 16B in the table T72-8 holds; hence if and only if k,, k», k, and 
k, are not L-false (hence there are no deductive relations between i 
and 4 with respect to e) and m, X m, < m, X m, (where all four 
m-values are >o). 

b. ¿ is extremely but not completely negative if and only if No. 8 
holds; hence if and only if k, is L-false (hence i and + are L-exclusive 
with respect to e) but k,, k,, and k, are not. 

c. zis completely but not extremely negative if and only if No. 15 holds; 
hence if and only if k, is L-false (hence} e DżV h, i and h are L-dis- 
junct with respect to e) but k,, ka, and k, are not. f 

d. 7 is extremely and completely negative if and only if No. 7 holds; 
hence if and only if k, and k, are L-false (in other words, f 
łe D (i = ~h), i and & are L-exclusive and L-disjunct with re- 
spect to e, 7 is L-equivalent to ~h with respect to e) but k, and k, 
are not L-false (in other words, neither} ¢.i D k nor þe. h D i; it 
follows from this that e L-implies none of the sentences i, ~i, h, 
and ~h). 

(From T72-8.) 

T76-3, Let i be irrelevant to h on e in Qy. Then the following holds. (It is . 

easily seen that for any e, k, and i exactly one of the four conditions is 4 
fulfilled.) . 

a. zis neither extremely nor completely irrelevant if and only if the case 
No. 16C in the table T72-8 holds; hence, if and only if k,, ka, &;, and | 
k, are not L-false (hence there are no deductive relations between i 
and % with respect to e) and m, X m, = m, X m, (where all four ; | 


\ 


m-values are >o). 

b. zis extremely but not completely irrelevant if and only if No. 4 holds; 
hence if and only if k, and k, are L-false (in other words, e.i is 
L-false) but k, and k, are not L-false (in other words, e . ~i L-im- y 
plies neither % nor ~h). 

. 7 is completely but not extremely irrelevant if and only if No. 13 i 
holds; hence if and only if k, and k, are L-false (in other words, i 
e. ~i is L-false, }e D i) but k, and k, are not L-false (in other 
words, e «7 L-implies neither 4 nor ~/). 

d. ż is extremely and completely irrelevant if and only if one of the ! 

cases Nos. 1, 2, 3, 5, 6, 9, 10 holds; hence if and only if one, two, or 


o 


§ 76. EXTREME AND COMPLETE RELEVANCE 423 


three of the sentences ¢.7, € « k, and e. ~h are L-false but not € . i 

alone. Here we may distinguish two kinds of cases: 

(1) One of the cases Nos. 1, 2, 3 holds if and only if k., ka, and at 
least one of the sentences k, and k, are L-false (in other words, 
e.i and at least one of the sentences e.k and ¢.~h are 
L-false). 

(2) One of the cases Nos. 5, 6, 9, 10 holds if and only if either k, and 
k, are L-false but not k+, or k, and k, are L-false but not k, (in 
other words, exactly one of the sentences e.k and e. ~h is 
L-false but ¢.7 is not). 

(From T72-8.) 


It is interesting to notice that in each of the three theorems Tx, T2, 
and T3, only part (a) is dependent upon the choice of a particular m-func- 
tion. These are the cases No. 16A, B, and C, where no deductive relations 
hold between and % with respect to e, All other parts of these theorems 
hold alike for all regular m-functions; they depend merely upon deductive 
relations or, more specifically, upon the L-falsity of some of the sentences 
kz, ..., Ry Thus, as we found earlier (T74-7, T75-7), all the concepts 
of extreme or complete relevance or irrelevance with respect to finite 
systems @y are independent of the m-functions. 

In general the transition from one c-function to another changes the 
relevance situation. However, there are special cases where one and the 
same of the four relevance concepts holds with respect to every regular 
c-function. The following theorem says that in any case of this kind either 
the extreme or the complete concept holds. 


T76-4. Let k, e, and i be sentences in fy. 

a. iis positive to h on e with respect to every regular c-function in £y if 
and only if 7 is either extremely positive to / on e with respect to 
every regular c or completely positive to h on e with respect to every 
regular c, or both. 


Proof. Let i be positive to h on e with respect to every regular c. Then, for 
every regular m, (1) m: X m, > m: X ms (T65-4c(2).) Therefore, for every 
regular m, (2) m; > o and-m, > 0, and (3) ma X m; = o. [(3) is seen as fol- 
lows. If for some m (3) did not hold, m, and m, and hence all four m-values 
would be >o. Since the four k-sentences are L-exclusive in pairs and their dis- 
junction is L-equivalent to e, we could in this case choose another m-function 
m such that m! = m, = m; = m4 = m'(e)/4; hence (x) would not hold for m’.] 
From (3): for any m, either m or m; or both are o. Hence either 4 is extremely 
positive with respect to the given c (2), 174-10) and hence with respect to every 
c (T74-7a), or i is completely positive with respect to every ¢ ((2), T 75-10, 
T75-7a) or both. The converse follows from D74-1a(r) and D75-1a(1). 
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b. 7 is negative to h on e with respect to every regular c-function in Qy if 
and only if is either extremely negative to % on e with respect to 
every regular c or completely negative to h on e with respect to every 
regular c, or both. (From T65-4d(2), T74-2c, T74-7b, T75-2c, T75- 
7b, D74-1b(1), D75-rb(z), in analogy to (a).) 

c, zis relevant to h on e with respect to every regular c-function in Qy if 
and only if at least one of the following four conditions is fulfilled: 
(i) 7 is extremely positive to # on e with respect to every regular c; 
(ii) ż is completely positive to h on e with respect to every regular c; 
(iii) 7 is extremely negative to 4 on e with respect to every regu- 
lar ¢; (iv) ż is'completely negative to k on e with respect to every 
regular c. (From T65-5c, (a,) (b).) 

d. 7 is relevant to h on e with respect to every regular c-function in £y if 

and only if 7 is either extremely relevant to % on e with respect to 

every regular c or completely relevant to 4 on e with respect to every 
regular c, or both. (From (c), D74-1c, D75-1c.) 

If @ is irrelevant to h on e with respect to every regular c-function in 

2x, then 7is either extremely irrelevant to / on e with respect to every 

regular c or completely irrelevant to % on e with respect to every 
regular c, or both. 


Proof. Let the condition be fulfilled. Then, for every regular m, (1) m: X m4 
= ma X m, (T65-4f(2)). Therefore, for every regular m, (2) m: X m, = 
ma X m, = o. [This is seen as follows. If for some m (2) were not fulfilled, then 
both products and hence all four m-values would be >o. Then, however, we 
could easily construct another function m’ which would not satisfy (1) (for in- 
stance, by taking m; = m; = m(e)/3 and m} = m = m(e)/6).] From (2): for 
any m, in each of the two products at least one factor must be o. Hence, for 
any m, at least one of the following four conditions is fulfilled: (i) m, = m, = 0} 
(i) m: = m, = 0; (ii) m, = m, = o; (iv) m, = m, = o. If one of the first 
three conditions is fulfilled, i is extremely irrelevant with respect to the ¢ in 
question (T74-3c) and hence with respect to every c (T74-7d). If one of the 
last three conditions is fulfilled, 7 is completely irrelevant with respect to every 
c (T75-3c, T75-7d). 


We have earlier seen (§ 65) how positive or negative relevance or irrele- 
vance is changed or remains unchanged when 7 or h are negated or when 
i and k exchange their places. Now we shall investigate what happens to 
extreme or complete relevance or irrelevance under such conditions. T5 
gives the answers to these questions. 


T76-5. Let e, h, and i be any sentences in @y. On each of the lines (a) 
to (f) in the following table, the condition in column (1) is logically equiva- 
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a) (2) (3) (4) (s) 


ito hone m~i to hone ito~hone | ~i to ~k one ktoione 

a. extr. pos. compl. neg. extr, neg. compl. pos. compl, pos, 

b. extr. neg. compl. pos. extr. pos. compl, neg. extr, neg. 

Ch extr. irrel. compl. irrel. | extr. irrel. compl. irrel. | (extr. or compl. 
irrel.) 

d. compl. pos. extr. neg. compl. neg. extr. pos. extr. pos. 

e. compl. neg. extr, pos. compl. pos, extr. neg. compl, neg. 

f. compl. irrel. | extr. irrel. compl. irrel. | extr. irrel. (extr. or compl. 
irrel.) 


lent to each of the conditions in columns (2) to (5) (except for (c)(5) and 
(f)(s5)); hence any two of the latter conditions are likewise logically 
equivalent. (Thus, for instance, (a)(1)(2) says this: ¢ is extremely positive 
to h on e if and only if ~i is completely negative to h on e.) We write 
‘extr.’, ‘compl.’, ‘pos.’, ‘neg.’, and ‘irrel.’ as short for ‘extremely’, ‘com- 
pletely’, ‘positive’, ‘negative’, and ‘irrelevant’, respectively. 


Proof. If i or h are negated or i and h are exchanged, then each of the sen- 
tences kr, ka, ky and k (as explained in § 72) is transformed into another one 


as follows. 
a 
1) i h,e kı ka ky kı 
2) ~i, hye hy he k ry 
3) i, ~h, e kı kı ky ky 
4) i, ~h, ê he ky ha ke 
5) h,i, e ki ky ka ky 


eS TS En 


Thus each of the cases Nos. 1 to 16:in the table T72-8 is transformed into another 
of these cases as indicated in the following table: 


Nee ik e hive 
I r I 
2 extr, and compl. irrel. 9 2 
3 5 5 
4 extr. irrel. 3 ó 
č extr. and compl. irrel. E 4 
7 extr. and compl. neg. 7 7 
8 extr, neg. 1 5 s 

ae extr. and compl. irrel. ‘ 
Ir extr. and compl. pos. 1 

12 extr. ‘pos. 14 

13 comp! 4 

14 compl. pos. 12 

compl. neg. $ 
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For example, the table says that No. 2 for i,k,e is transformed for ~i,he into 
No. 5. This is seen as follows. No. 2 is the case where m, = m, = m; = 0, 
m, > o (T72-8, columns (1) to (4)), hence where &:, kı, and k; are L-false but 
k, not. Now, according to line (2) of the previous table in this proof, kı, ka, ky, 
and ką are transformed for ~i,h,¢ into hy, ky, ki, and ka, respectively. Therefore, 
case No. 2 is transformed into the case where ky, k,, and k, are L-false but kais 
not, hence where m,, m,, and m; are o but m, > o; and this is No. CA 

In column (1) of the last table, indications of extreme and complete rele- 
vance or irrelevance are given for each case, taken from column (14) of T72-8. 
This makes it easy to prove each item of T5. It will be sufficient to explain here 
one instance, say the item (a)(1)(2). It says that i is extr. pos. to h on e if and 
only if ~ż is compl. neg. to + on e. This is proved by the above table as follows. 
We seein column (1) that i is extr. pos. to hon e in the cases Nos. 11 and 12 and 
no others. We see in column (2) that for ~i to # on e these two cases are trans- 
formed into the cases Nos. 7 and rs. And then we find (by going back to col- 
umn (1)) that Nos. 7 and rs are the only cases in which compl. neg. holds. This 
and the other items can also be proved directly (i.e., without the use of the last 
table) with the help of T74-1c, T74-2c, T74-3c, T75-1c, T75-2c, T75-3c, and 
the first table. 

The procedure for all other items in Ts is analogous, except for (c)(5) and 
(f)(5). In these two points there is no sufficient and necessary condition. For 
example, (c)(1)(5) says merely that, if (but not only if) 7 is extr. irrel. to kon e, 
then h is either extr. irrel. or compl. irrel. (or both) to 7 on e. This is seen as 
follows. We find in the last table that 7 is extr. irrel. to # on e in Nos. 1 to 6,9, 
and ro, and in no others. We see in column (s) that these cases are transformed 
for h,t,e into Nos. 1, 2, 5, 6, 3, 4, 9, and 13. These cases, however, do not repre- 
sent just one kind of irrel.; some are extr. irrel., but No. 13 is not; some are 
compl. irrel., but No. 4 is not; and No. 10 is not among them although it is extr. 
and compl. irrel. Thus here only the weak statement given above can be made. 


Now let us see what is stated by Ts. First let us look only whether pos., 
neg., or irrel. holds in a given case, leaving aside the questions of extr. 
and compl. We find that negating i alone or & alone turns pos. into neg. 
and vice versa; that negating both 7 and % or exchanging 7 and h leaves 
pos. and neg. unchanged; and that irrel. remains unchanged under any 
of these transformations. All this was already known (T6s5-3). 

The new results in T5 concern the transformation of extr. and compl. 
We see that by negating h alone extr. becomes again extr., and compl. 
again compl. Negating 7 (no matter whether i alone or both i and h, 
since negating / has no effect in this point, as we have just seen) turns 
extr. into compl., and vice versa. Now let us inspect the relation between 
columns (1) and (5), the effect of exchanging 7 and h. For irrel., we have 
here only weak statements as explained previously (see the end of the 
proof). The results for pos. and neg. may seem surprising at first: for pos., 
extr. is turned into compl. and vice versa; for neg., however, extr. and 
compl. remain unchanged. The following considerations may make these 
results plausible. Extr. neg. means that i and } are L-exclusive with re- 
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spect to e (Fe D ~(i . k), T74-2d); this relation remains, of course, un- 
changed when ż and % are exchanged. The same holds for compl. neg., 
because this means that 7 and k are L-disjunct with respect to e 
(te D iV k, T75-2d). On the other hand, extr. pos. involves an implica- 
tive relation between i and k (te D (¿D h), T74-1d) while compl. pos. 
involves the converse implicative relation (+e D (k D i), T75-1d). There- 
fore, the exchange of 7 and + cannot leave these relations unchanged but 
transforms each into the other. 


This concludes the theory of relevance dealt with in this chapter, and 
also the general theory of regular c-functions discussed in the last two 
chapters, which constitutes the first and fundamental part of quantitative 
inductive logic. The next chapter (vii) does not belong to quantitative in- 
ductive logic but gives an outline of the foundations of comparative in- 
ductive logic. The construction of quantitative inductive logic will be con- 
tinued in the chapters after the next. 


CHAPTER VII 
COMPARATIVE INDUCTIVE LOGIC 


In this chapter a system of comparative inductive logic is constructed. Its 
basis is a comparative concept of confirmation MG. ‘MC(h,e,h’,e’)’ is intended 
as explicatum for ‘the hypothesis 4 is confirmed by the evidence e equally 
strongly or more strongly than h’ by e” (§ 79). Thus its meaning corresponds in 
Some sense to the quantitative statement ‘c(e,k) = c(e’,h’)’ (§ 80). However, 
the definition of MG (D81-1) uses only L-concepts, no quantitative concepts. 
Nevertheless, it is shown that the required correspondence between MG and 
the quantitative c-functions holds (§ 81). 

In the same way, ‘Gr(h,e,h’,e’)’ corresponds to ‘e(e,h) > e(e’,h’)’, ‘Eq(h,e,h’ ,e’)’ 
to ‘c(e,h) = c(e’,h’)’ (§ 82), ‘Mar(h,e)’ to ‘e(he) = 1’, and ‘Min(h,e)’ to ‘c(h,e) = 
©’ (§ 84). All these concepts are defined, directly or indirectly, on the basis of 
L-concepts without use of quantitative concepts like c-functions. Therefore they 
are called purely comparative (i.e., nonquantitative) concepts. 

Some of the theorems of this chapter involving these comparative concepts 
(§ 85) correspond to certain theorems concerning ¢-functions stated in a pre- 
ceding chapter. 

Although we shall try in later chapters to construct a quantitative system of 
inductive logic, at the present time the question whether a comprehensive and 
adequate quantitative system is at all possible is still controversial. This fact is 
the chief reason for the importance of a comparative system of inductive logic. 
However, even if a quantitative system is possible, it is still interesting to see 
which results can be obtained with more restricted means. 

In the last three sections (§§ 86-88) the classificatory concept of confirma- 
tion is investigated: ‘i is confirming evidence for the hypothesis / (on the evi- 
dence e)’, in symbols ‘G(h,i,e)’. It is related to our earlier concept of positive 
relevance (§ 65), which was defined in quantitative terms. The problem of an 
explicatum for the classificatory concept defined in nonquantitative terms is 
discussed but remains unsolved. Several explicata of this kind are examined but 
are found to be too narrow. 


§ 79. The Problem of a Comparative Concept of Confirmation 


Our problem is to find an adequate definition for a comparative concept of 
confirmation MG. ‘MG(h,e,h’,e’)’ is to be an explicatum for the following ex- 
plicandum: ‘The hypothesis 4 is confirmed by the evidence e equally strongly 
or more strongly than h’ by e”. The task of this explication is important because 
some authors believe that only a comparative concept of confirmation is pos- 
sible, not a quantitative one. 


In this chapter the basis of a comparative inductive logic will be con- 
structed by laying down a definition of a comparative concept of confirma- 
tion and stating some theorems based on this and some other definitions. 
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We explained earlier the nature of comparative concepts in general, in 
contradistinction to classificatory and quantitative concepts (§ 4), and the 
role of comparative concepts as explicata (§ 5). Then we discussed the 
comparative concept of confirmation as an explicandum (§ 8). For reasons 
explained earlier, we use the following form for this concept (see end of § 4, 
there called the second kind of comparative concept): ‘the hypothesis h 
is confirmed by the evidence e equally strongly or more strongly than 
h' by e”. Our task will now be to find an adequate explicatum for this 
concept. We shall use the symbol ‘ING’ for the explicatum to be defined; 
hence we shall write ‘MNC(h,e,h’,e’)’ as explicatum for the sentence men- 
tioned. It is to be noticed that ‘NG’ is not a symbol of our symbolic object- 
languages &, but belongs to the metalanguage like the other German let- 
ters used (§ 14). While a quantitative concept of confirmation, i.e., any 
one of the e-functions which have been discussed in the two preceding 
chapters, assigns a number to a pair of sentences, the comparative con- 
cept MG is a relation between four sentences (h, e, h’, e' in the above ex- 
ample); we shall sometimes regard it as a dyadic relation between two 
pairs of sentences (the pair k,e and the pair he’). 

An investigation of the possibilities for defining a comparative concept 
of confirmation, in other words, a comparative explicatum for proba- 
bility,, is the more important, since some authors believe that no ade- 
quate, entirely quantitative explicatum for probability, can be found and 
that therefore any explicatum must be at least partly comparative. Thus 
Kries, Keynes, and Koopman in their theories of probability, restrict the 
possibility for numerical values to a narrow class of special cases; and 
Nagel likewise expresses serious doubts in this direction. 

We have mentioned Kries’s arguments against the possibility of numeri- 
cal values (§ 46). Let us now briefly look at the reasons given by Keynes. 
He admits that the generally accepted opinion is that the assignment of 
numerical values for probability, is at least theoretically possible, no mat- 
ter whether the actual determination of the values in given cases is prac- 
tically possible or not ({Probab.], p. 20); and he quotes W. F. Donkin 
and De Morgan to this effect. But as an argument for his own, more skep- 
tical view he points out the fact that probability, is often based on simi- 
larity. Thus the probability, of the hypothesis that a certain picture was 
made by a certain painter may depend upon its similarity to other known 
paintings by the same artist. “We can say that one thing is more like a 
second object than it is like a third; but there will very seldom be any 
meaning in saying that it is twice as like. Probability is, so far as measure- 
ment is concerned, closely analogous to similarity” (p. 28). 
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Keynes goes even further and argues that the task of arranging proba- 
bilities, without numerical values in a mere order of magnitude—so that 
we might say that one probability, is greater than another without saying 
how much greater—is often unsolvable (p. 29). Consider a general hy- 
pothesis 4 judged on the basis of different reports e and e’ concerning the 
results of different sets of experiments. “If we have more grounds than be- 
fore,” that is, if e’ L-implies e without being L-equivalent to it, “compari- 
son is possible; but if the grounds in the two cases are quite different” 
(which presumably means that e and e’ are L-independent, i.e., L-impli- 
cation does not hold between either of them and the other or its negation) 
“even a comparison of more and less, let alone numerical measurement, 
may be impossible” (p. 30). 

Similarly, Nagel says: “It does not seem possible to assign a quantita- 
tive value to the degree of confirmation of a theory” ([Principles], p. 68). 
One of his arguments is based on the principle of the variety of instances 
(see above, § 47E, and below, § 110 I). Nagel believes that in some cases 
a nonnumerical comparison in terms of more or less is possible, while in 
general not even this is possible so that two degrees are in this case in- 
comparable. 

The comparative concept ME which we shall define will be in agree- 
ment with the conceptions of Kries, Keynes, Nagel, and Koopman in 
the following respect. It does not give a comparison in all cases of four 
sentences. Not even with respect to a given fixed evidence e does it ar- 
range all hypotheses in a linear order; in general, two hypotheses / and h! 
will turn out to be incomparable on the evidence e. Likewise, with respect 
to a given hypothesis i, comparison will be possible only in certain cases 
while in general two evidences e and e’ are incomparable with respect to h. 
Moreover we shall find that MÇ fulfils all the axioms in Koopman’s 
axiom system (§ 83B). 


§ 80. Requirements of Adequacy 


In view of the explicandum mentioned in § 79, a relation between sentences 
h, e, h', e' may be said to be in accord with a given regular c-function cif for all 
of its instances the following holds: c(h,e) = c(h’,e’). We lay down two require- 
ments which a relation must fulfil in order to be an adequate explicatum for our 
explicandum: (1) it must be in accord with all regular c-functions (R1); (2) it 
must be the most comprehensive relation of this kind (R2). The task is to con- 
struct a purely comparative, i.e., nonquantitative definition such that those 
two quantitative requirements are fulfilled. In preparation for this task, the 
relations between the ranges of the sentences involved are analyzed. 


In order to prepare the way for the later construction of a definition of 
the comparative concept of confirmation MG, we shall discuss here the 
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question which requirements a concept must fulfil in order to be ac- 
ceptable as an adequate explicatum for our explicandum. We remember 
that the explicandum was as follows: 

(1) his confirmed by e equally strongly or more strongly than h’ by e’. 


Although our aim is to construct a purely comparative definition of 
MC, that is, one not containing any quantitative concepts, it will, never- 
theless, be helpful to study, merely for heuristic purposes, the relation 
between INC and the quantitative concepts of confirmation, in other 
words, the regular c-functions defined earlier (D55-4). The following anal- 
ysis refers to the finite language systems fy. 

Suppose somebody has chosen a certain regular c-function ¢ as his con- 
cept of degree of confirmation. Then, to the comparative explicandum 
(1) mentioned above, the following quantitative formulation (2) corre- 
sponds in some sense: 

(2) c(h,e) = c(h’,e’) . 

If he now wants to choose a comparative relation of confirmation, say T, 
he would make sure that T is in accord with his concept cin this sense: any 
sentences %, e, h’, e for which T holds fulfil the condition (2). 

Which way of finding an adequate concept MG is suggested by these 
considerations? It would not do, first to select a suitable c-function and 
then to look for a relation ME which is in accord with it. The task of 
selecting an adequate concept among the infinitely many c-functions in- 
volves very serious problems and difficulties. On the other hand, the gen- 
eral concept of regular c-functions is simple and relatively unproblematic. 
As we have seen earlier (§§ 52 f.), this concept is based on conventions 
which seem very plausible and widely accepted. The properties which the 
regular c-functions have in common are those which nearly all authors 
who have worked on probability; have attributed to this concept (§ 62). 
Our aim is to find a comparative relation ME which grasps those logical 
relations between sentences which are, so to speak, prior to the introduc- 
tion of any particular m-function for the ranges and any particular c-func- 
tion; in other words, those logical relations with respect to which all the 
various c-functions agree. This suggests the stipulation that the relation 
IMC be defined in such a way that it is in accord with all regular c-functions. 
This is formulated in the following requirement, to which we shall later 
add a second one. 

R80-1. First Requirement for IC (with respect to ty). For any sen- 
tences h, e, h’, e', if MC(h,e,h’,e’) then, for every regular c-function ¢, 
c(h,e) 2 c(h’ ,e’). ' 
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It is quite easy to find relations which, taken as MG, fulfil this re- 
quirement. It is clear that these relations can hold only in cases where 
e and e’ are not L-false, because otherwise the c-functions are not ap- 
plicable (T55-2b). We shall presuppose for all examples of the following 
discussion that this condition is fulfilled, without mentioning it explicitly 
in the definiens of T;, etc. 

First let us see whether we can find quadruples of sentences h, e, h’, e' 
which satisfy the following condition (3) occurring in Rr: 


(3) For every regular c-function c, c(#,e) 2 c(h’,e’) . 


This condition (3) is, among other cases, always satisfied if | e D h, be- 
cause then c(h,e) = 1 (Ts59-1b); likewise if | e’ D ~h’, because then 
c(h’,e’) = o (T59-1e). (Here the fact is used that any c-value is <1 and 
2o (Tso-1a).) Thus, if we define ‘T,(h,e,k’e’) =p: HeD k, then Ty, 
taken as ME, fulfils the first requirement R1. The same holds for the 
relation T, defined by ‘| e’ D ~h”; and also for the disjunction of T, 
and T,. However, it is obvious that these relations are not adequate for 
our purpose, although they fulfil Rr. They are too narrow because they 
are restricted to two rather trivial kinds of cases. As an example of a non- 
trivial case in which condition (3) is satisfied, consider the following, 
where ‘M’ is any molecular predicate: h and h’ are ‘(x)(Mx)’; e is 
‘Ma . Mb’; e' is ‘Ma’. h and h’ are here the same sentence, a simple uni- 
versal law. e’ gives just one confirming instance for the law; e gives two, 
among them the one of e’. It seems plausible that the law is confirmed 
by the two instances at least as strongly as by the one, in other words, that 
the example satisfies condition (3). And this can indeed easily be proved 
(with the help of T61-3e). More generally, let T, be defined as follows: 
‘T,(h,e,h',e’) =p: k’ is the same as k, and there is a sentence i such that 
eise’.i,and}e’.h D i’. The example just discussed represents a special 
case of T,. It can then be shown (again with the help of T61-3e) that T; 
satisfies the first requirement. Further, let T, be defined by the following 
conditions: e’ is the same as e, and | A’ D k. Here, the evidences in the 
two pairs are the same, and the second hypothesis L-implies the first and 
hence is at least as strong as the first. Therefore it seems plausible that 
the second hypothesis is confirmed by the common evidence at most as 
strongly as the first, and hence that Tẹ satisfies the first requirement. 
And this can indeed be proved (with the help of Ts9-2d). Every case, i.e- 
quadruple of sentences, in which one of the relations T,, T2, T3, T4 holds, 
satisfies condition (3). Therefore, also, the disjunction Ts of these four 
relations fulfils the first requirement. Shall we then take Ts as MG? The. 
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objection of triviality mentioned earlier against the disjunction of T, and 
T, would not hold here because the cases of T, and T, are not trivial. The 
reason why we would hesitate to take T; is rather the fact that it seems 
arbitrary to stop here, There might be still other cases likewise satisfying 
condition (3). If somebody then defines a relation including, in addition to 
our cases, some new cases of this kind, then his concept is more compre- 
hensive than T; and therefore a more satisfactory explicatum for the 
comparative concept of confirmation. And a further relation may be still 
more comprehensive and hence still more satisfactory. Thus the problem is: 
how can we know when we have exhausted all possible cases? If we can, 
then a relation comprehending all these cases would be the most satisfac- 
tory solution. Therefore we shall lay down as the second requirement 
that the relation should have the maximum extension among all relations 
fulfilling the first requirement; in other words, it should hold in all cases 
satisfying condition (3). This leads to the following formulation: 


R80-2. Second Requirement for MC (with respect to Ly). For any sen- 
tences h, e, h’, e', if c(h,e) = c(h’,e’) for every regular c, then ME(%,e,k',e'). 
The two requirements Rx and R2 together stipulate that MG is to be 
defined in such a manner that the following condition (4) is fulfilled: 
(4) For any sentences k, e, k’, e' (in y), MC(h,e,h’,e’) 
` if and only if, for every regular c, c(%,e) 2 o(h’,e’) . 


It is clear that this condition (4) determines uniquely a relation MC. 
(4) says in effect that this relation is the most comprehensive relation 
which is in accord with all regular c-functions. 

We could take (4) itself as definition for MG. But this is not the form 
of definition we are looking for. We intend to give a purely comparative 
definition, that is to say, one not referring to any quantitative concepts. 
(4) however refers to the quantitative c-functions and thereby implicitly 
to the measure functions m for the ranges of the sentences involved. A 
purely comparative definition may only refer to those relations between 
ranges which are independent of any particular m-function for the 
ranges, hence to the inclusion relations between ranges. Inclusion between 
the ranges of two sentences means the same as L-implication between the 
sentences (D20-1c). Therefore, we aim at a definition of MC in terms of 
L-implication (or other L-concepts related to L-implication, like L-truth 
and L-falsity). A definition of this kind will be given later (D81-1), Then 
it will be shown that the concept MG, although defined in a purely com- 
parative way, satisfies the requirements Ri and R2, which are formu- 


lated in quantitative terms. i 
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Statement (4) says that (3) is a sufficient and necessary condition for 
MC. In order to construct a purely comparative definition, we have to 
find a sufficient and necessary condition for ME in comparative terms. To 
prepare the way to this goal, we shall now investigate what the condition 
(3) means in terms of m-functions and ranges. c-values (for Qy) are de- 
fined as certain quotients of m-values (D55-4 and 3). (3) means, accord- 
ingly, that for every regular m-function m (with respect to Qy) the follow- 
ing condition is fulfilled: 

mesh) _ m(e’.h’) 
(s) me = me)’ 
where m(e) > o and m(e’) > o, and hence e and e’ are not L-false. Since 
m/(e) and m(e’) are positive, (5) can be transformed into 


(6) m(e.h) X m(e’) = m(e’ .h’) X me). 
e’ is L-equivalent to (e’ . h’) V (e' a ~h’) (T21-5j(2)); therefore m(e’) = 


m(e’«h’) + m(e’. ~h) (T57-1m). Analogously, m(e) = m(e. h) + 
m(e. ~h). By substituting these values in (6) and simplifying, we obtain: 


(7) m(e.h) X m(e’. ~h’) = m(e’.h') X m(e. ~h). 


This is obviously satisfied if at least one of the two m-values on the right 
side is o. m(e . ~h) is o if and only if 


(8) teD h; 
m(e’ . k’) is o if and only if 
(9) te D ~h' 


(according to T58-1b). These are the two trivial kinds of cases mentioned 
earlier as T, and T,, respectively. Now let us suppose that both m-values 
on the right side of (7) are positive. Which relations must then hold be- 
tween the ranges of the sentences involved in order to assure that (7) 
holds for every regular m? 

The m-value for any sentence j has been defined (D55-2) as the sum 
of the m-values for those 3 (state-descriptions) which belong to #(j) (the 
range of j). m-values for all 3 may be chosen arbitrarily as any positive 
numbers whose sum is 1 (D55-1). Now we shall see that (7) holds for 
every regular m only if the ranges of the following three sentences are null: 
eh’. ~e, e wh’ se. ~h, e. ~h. ~e"; let us call them R, Ra, and Rs, 
respectively. R, is included in R(e’ . 4’), while it has no elements in 
common with the following three ranges: R(e. h), R(e’. ~h’), and 
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R(e. ~h). Therefore, if R: is not null, we can find a regular m-function 
whose value for R, and thereby for e’ .h’ is arbitrarily high, that is, 
as close to 1 as we want, while the other three values in (7) are small 
and equal to one another; hence (7) is in this case not satisfied. R, is 
included in R(e’.«h’) and in R(e. ~h), while it has no elements in 
common with R(e./) and with R(e'. ~h’). Therefore, if R, is not 
null, we can choose an m-function such that its two values on the 
right side of (7) are close to 1, while the two values on the left side 
are very small, and hence (7) is violated. Finally, R, is included in 
R(e. ~h), while it has no element in common with the ranges of the 
other three sentences in (7). Here, a consideration analogous to that 
concerning KR, shows that, if R, is not null, there is an m-function violat- 
ing (7). Thus it is a necessary condition for the general validity of (7) that 
R, Ra, and R, are null. Let us now formulate this condition in L-terms. 
That 9, is null means that R(e’ . h’) C R(e), hence that | e’.h’ D e. That 
Ra is null means that R(e’. h’ .e) C R(4), hence that} e’.k’.¢ D k. The 
two results combined say that þe’. h’ D e.h. R, is Rew ~(h V e')); 
that this range is null means that |e D 4 Ve’. Thus the following is a 
necessary condition for the general validity of (7): 


(10) te’. h’ D e.h and simultaneously te Dh Ve’. 


It can easily be shown that T, and T, as earlier discussed are special cases 
of (10) and that (10) is much more general. However, we have not shown 
that it is general enough. Let us tentatively assume that it is; in other 
words, that the disjunction of (8), (9), and (10) is a sufficient and neces- 
sary condition for the general validity of (7), and hence for (5), and hence 
for (3). That is to say, we shall choose this disjunction for constructing 
a tentative definition of MGE in the next section (D81-1). Then we shall_ 
prove that the assumption just made is correct, in other words, that the 
relation thus defined fulfils the two requirements of adequacy earlier 


stated. 


§ 81. Definition of the Comparative Concept of Confirmation MC 


Suggested by the considerations in the preceding section, a definition for MC 
TE L-concepts is laid down (D1). Then it is shown that ME fulfils the 
two requirements stated earlier, for any finite system €y (T1). For læ, a similar 
but somewhat restricted result is found (T2). 


We shall now lay down a definition (Dx) for MG, the comparative con- 


cept of confirmation. This definition is suggested by the considerations in 
the preceding section. The conditions (a) and (b) in Dr have been men- 


‘ 
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tioned at the beginning of § 80; they are obviously required because we 
apply the concept of confirmation only to non-L-false evidences. (c,) and 
(c2) in Dr correspond to (8) and (9) in § 80; these are the two trivial cases 
where c(k,e) = x or c(h’,e’) = o, respectively. (c;) corresponds to (10) 
in § 80; this condition applies to the nontrivial cases. The preliminary 
considerations in § 80 were restricted to finite systems fy, but the defi- 
nition Dx is laid down in a general form for any finite or infinite system &. 
+D81-1. MC(h,e,h’,e’) (with respect to a system £) = ps the following 
three conditions are fulfilled (in £): 
a. eis not L-false. 
b. e’ is not L-false. 
c. Either (c,) þe D h, 
or (ca) fe’ D ~h, 
or (c;) te’. k’ D e.h and simultaneously | e D 2 Ve’. 
(The three conditions under (c) are meant in a nonexclusive sense: two 
or all three of them may be fulfilled.) 


Now we have to show that the relation ME defined by the purely com- 
parative definition Dı nevertheless fulfils the quantitative requirements 
of adequacy discussed earlier and hence is an adequate explicatum for the 
comparative explicandum (as formulated, e.g., by (1) in § 80). The first 
requirement (R8o0-1) demands that ME be in accord with every regular 
c-function. The second requirement (R80-2) demands that MGE be the 
most comprehensive relation fulfilling the first requirement. The follow- 
ing considerations—which, although somewhat lengthy, are quite ele- 
mentary—prove that both requirements are fulfilled. This result will be 
formulated in T1. The following considerations and Tx apply to any finite 
system y. The question of & will be dealt with later, in T2. 

Let h, e, h’, and e’ be any sentences in £y such that e and e’ are not L- 
false. Our aim is to show that if these sentences satisfy the following con- 
dition (x) then they satisfy also (2), and vice versa: 

(1) ME(h,e,h',e’) , 

(2) For every regular c-function c, c(h,e) = c(z’,e’) - 

In what follows we shall transform condition (2) step for step into other 
forms. Each step here made is a reversible logical transformation, that is 
to say, the conditions which will be stated are not only logical conse- 
quences of (2) but logically equivalent to (2), in other words, each is a 
sufficient and necessary condition for (2). In this way, (2) will be trans- 
formed in turn into (3), (4), (5), (6), (7), (8), (9), (10), (11), and thereby 
into (1). Thus the aim will be reached 
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We form certain conjunctions out of the given four sentences and their 
negations and denote them by fr’, . . . , fy as follows: 

jrise.e shah’, 

jrisewe’ sherk’, 

jise. anh h’, 

Jy is ee «wham, 

jsise. ~e wh, 

isis e. ~e" . ~h, 

jiis ~e.e' eh’, 

jsis ~e. e a wh’, 

j is ~e. ~e: 
These nine j-sentences are L-exclusive in pairs and L-disjunct. [This can 
be shown as follows. The j-sentences may be regarded as formed in the 
following way. Consider the truth-table for the four sentences e, e’, h, 
and hk’ as explained in § 21B, and let k,, . . . , kis be the sixteen conjunc- 
tions corresponding to the lines of the truth-table. Then each of the sen- 
tences fı, fa, 73, and j, is one of these k-sentences; each of js, js, jz, and js is 
L-equivalent to a disjunction of two &-sentences; and jy is L-equivalent 
to a disjunction of four 4-sentences. Each k-sentence occurs here in exactly 
one j-sentence. Hence the assertion (from T21-7a,b).] Therefore the ranges 
of the j-sentences constitute an exhaustive and nonoverlapping division 
for the state-descriptions 3 in fw. [Incidentally, the j-sentences are to 
some extent analogous to the 3 themselves, which are conjunctions of 
atomic sentences and their negations. However, there is one important 
difference. The atomic sentences are assumed to be logically independent 
of one another, and therefore all 3 are factual. On the other hand, among 
the sentences h, e, h', and e’, some L-relations (e.g., L-implication) may 
hold; then some of the j-sentences will be L-false. We shall soon discuss 
cases of this kind.] We can easily see that the following L-equivalences 
hold (in analogy to T21-7d): 

be = jı Vja V js Vja V js Vis, 

be’ = jr Via V js ViVi, Vis, - 

tesh =7, V ja V js 

Helah = jN ja V jr 
For any regular m-function m, its value for each of these disjunctions is 
the sum of the values for the components, since these are L-exclusive in 
pairs (T57-1n). Writing ‘m,’ for ‘m(jn)’ (n= 1,..+, 9), we obtain: 

m(e) = m, + m: + m, + m, + ms + m, 

me’) = m, + m, + m, +m, + m, + ms, 
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m(e.k) = m, +m, + m, 
m(e’ .h’) = m, +m; +m. 


Condition (2) can be expressed in terms of m-functions (see (5) and (6) 
in § 80) as follows: 


(3) For every regular m-function m, 
m(e.h) X m(e’) 2 m(e’ hk’) X me). 


Substituting here the m-values just found and multiplying out, we obtain 
on either side a sum whose terms are products of two m,-values each. If 
equal terms on both sides are omitted, the remaining terms can be com- 
bined as follows: 
(4) For every regular m, 

(m: + m, + ms) X (ma + ms) + (m, + ms) X m, Ž 

(m: + my + my) X (m, + me) + (m, + m,) X m,. 


Let us write ‘R,’ as short for ‘R(ja)’ (w = 1, ... , 9). These nine ranges 
form, as mentioned above, a complete division for the 8. Therefore, if we 
choose an arbitrary sequence of nine nonnegative real numbers 1, fa 
. ++, 19 Such that (a) rn = o if and only if 7, is L-false, and (b) r; + ra + 
...+ 7, = 1, then we can find a regular m such that, for every n from 
1 to 9, m(ja) = ra. (This follows for any L-false j, from (a) and Ds5-2a; 
for any other j, from (b) and T58-1l.) 

Now we shall show that (4) holds if and only if the right-hand side in 
(4) equals o, that is: 


(5) For every regular m, 
(m: + m, + m,) X (m, + me) + (m, + m,) Xm =o. 


Proof. (a) If the sentences A, e, h’, e' are such that (5) is satisfied, then ob- 
viously (4) is also satisfied, because m-values are always nonnegative. (b) Now 
the converse must be shown. Suppose that (5) is not satisfied. This means that 
there is an m-function, say m’, such that the left side of the equation in (5) is >0, 
hence also the right side in (4). Now we take another m-function m” which has 
the following values. For n = 1, 3, 4, 6, and 7, mi’ = m}; mi’ = 3m{; my! = 
4m;; mg’ = mg. Let us abbreviate (4) for m” by ‘g: + qa = qs + qe’, where the 
q-symbols stand for the four products occurring. Then we easily see from the 
defined values of m” that g: S g; and gz S qu; moreover, since q; + 44> % 
either g, or q, (or both) >, and hence either g: < g, or ga < q, (or both). 
Hence, q: + qa < q3 + qa Thus m” does not satisfy (4). Therefore, if (4) is satis- 
fied then so is (5). 


Since m-values are nonnegative, the sum in (5) is o if and only if both 
terms of the sum are o. A product is o if and only if at least one factor is o. 


A sum of certain m-values is o if and only if all these m-values are o. 
Therefore (5) holds if and only if (6) holds: 
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(6) For every regular m, 
(a) either m, = m, = m, = oor m; = me =.0; 
and (b) either m, = m, = o or m, = o. 
(‘Either—or’ is here always understood in the nonexclusive sense.) Com- 
bining each of the two.cases in (a) with each in (b), we obtain: 
(7) For every regular m, i 
either (a) m: = m, = m, = 0, 

or (b) m, = m, = m, = m, = 0, 

or (c) m; = m = m, = 0, 

or (d) m, = m, = m = 0. 

Here we may omit (b) because (a) follows from it. (‘A or (A and B)’ 
means the same as ‘A’, T21-sp(z)). The m-values here are those of the 
j-sentences with the same subscripts. m(ja) = o if and only if 7, is L-false 
(T58-1b, see also T'58-rh). Several sentences are L-false if and only if their 
disjunction is L-false (T20-2q). Hence we obtain (in the order (7)(d), 
(a), (c)): 

(8) Either (a) js Vj, V js is L-false , 
or (b) ja Vj; Vj; is L-false , 
or (c) j; Vj; is L-false, and je is L-false . 
Now we eliminate the j-abbreviations. (For (b), we use the last of the 
four L-equivalences mentioned earlier.) 
(9) Either (a) (e.e «~h ah’) V (e.e a ~h. ~h’) V (e. ~e". ~h) 
is L-false , , 
or (b) e' .h' is L-false , 
or (c) (~ene ah') V (e.e' «~h. h’) is L-false and ¢. ~e' . ~h 
is L-false . 
We transform this as follows (for (a): T21-5j(2); for (c): T21-5f(1)); 
(10) Either (a) e. ~h is L-false , 
or (b) e' « k’ is L-false , 
or (c) e’«h’.[~eV (e. ~h)] is L-false and e. ~(e V h) is 
L-false . 

~eV(e.~h) is L-equivalent to ~eV ~h (Tart-sm(4), T21-3c, 
Tar1-ss(1)), and hence to ~(e « h) (T21-5f(3)). Now we transform in 
terms of L-truth (T20-1a, T21-5g(1)): 

(11) Either (a) þe D k, 
or (b) fe’ D ~k, 
or (c) Fe’ ak’ D eskandte D (e Nh). 
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Since we have presupposed that e and e’ are not L-false, we see from 
Dx that (11) holds if and only if 


(12) f MC(h,e,h’,e’) . 
Thus we have shown that (2) holds if and only if (12) holds. This result 
is stated in the following theorem. 


+T81-1. Let h, e, h’, e’ be any sentences in fy. Let e and e’ be non- 
L-false in fy. 
a, If MC(h,e,h’,c’) (in Ly), then for every regular c (with respect to £y), 
c(h,e) 2 c(h’ ,e’). 
b. (The converse of (a)). If for every regular ¢ (with respect to ly) 
c(h,e) 2 c(h',e’), then MC(h,e,h',e’). 


Tr shows that MG, as defined by Dz, fulfils the requirements R80-1 
and 2 which we have laid down for an adequate comparative concept of 
confirmation. Moreover, ME is the only relation fulfilling both require- 
ments, because, as we have seen earlier, these requirements determine 
uniquely one relation; nevertheless, definitions for INC differing from Dı 
considerably in their formulations while stating logically equivalent con- 
ditions are of course possible. In the terminology of § 80, Tra says that 
MC is in accord with every regular c-function, and it, together with Trb, 
says that MÇ is the most comprehensive relation for which this holds. 

The definition Dı of ME has been formulated in a general way for any 
finite or infinite system l. T1, however, refers only to the finite systems 
2x. Now we shall see how the results can be transferred to Qo. In the fol- 
lowing theorem T2, (a) corresponds to Tra, and (b) to Tıb. However, 
we restrict T2b to nongeneral sentences. The reason for this is the fact, 
which we found earlier, that in Qo the relations between m- and c-values, 
on the one hand, and L-concepts, on the other hand, are simple only for 
nongeneral sentences but rather complicated for general sentences (see 
the discussion in § 58, and T59-5). 


+T81-2. Let h, e, h’, and e’ be sentences in Qe. Let e and e’ be non- 
L-false in every system € in which they occur. 
a. If MC(h,e,h’,e’) in Qo and oc is any regular c-function for e posses- 
sing values for the pairs k,e and h’,e’, then oc(h,e) = «c(h’,e’). 

__ Proof. Let the conditions be fulfilled. Then Dic is fulfilled in lo and hence 
likewise in every system £y of a final segment of the sequence of finite systems 
(T20-11). It follows from the assumptions stated that Dra and b are fulfilled in 
every system £ in which e and e’ occur. Thus in every system of a final segment 
all conditions in Dx are fulfilled and hence MG(h,e,h’,e’). Let we be the c-se- 
quence on which the given oc is based. Then in the final segment we(h,e) 2 . 
nc(h',e’) (Tra). Therefore (T40-21f) ac(h,e) = wc(h’,e’). 


ee 


<a 


b. 


§ 82. 
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Let h, e, h', and e’ be nongeneral (hence every regular œc has values 
for the pairs k,e and h’,e’). If for every regular oc in lo œc(h,e) 2 
ac(h’,e’), then MC(h,e,h’,e’). in Qo. 


Proof. Let the conditions be fulfilled. Let &y be any of those finite systems 
in which all four sentences occur. These systems form a final segment. Then 
for every regular c-function we for fw welh,e) 2 we(h',e’) (T57-6c). Therefore 
MC(h,e,h',e’) in Qw (Trb) and hence also in lo (T20-10). 


Examples. ME holds in the following three examples. They belong to the ; 
nontrivial kind Dre;. 

First example. e and e' ate ‘P,a’; h is ‘Paa V Pa’; hi’ is ‘Paa’. We see easily 
that | e’ De, | hk’ Dh, te De’; hence Dic; is fulfilled. In this simple case it 
is also rather obvious that the explicandum holds, that is to say, that % is con- 
firmed by the common evidence at least as strongly as h’, because h is weaker 
than X (ie., | h’ D h but not | k D K’). 

Second example. e is ‘(P:b V Pb) » (Pab V P)’; e' is ‘Pb’; h and h’ are 
‘Pb’. We see easily that fe’ sk’ D e, Hh’ D h, and be D hV e. Thus Dic; is 
fulfilled. This case is not quite as simple as (1), although the hypotheses are 
the same; there is no simple logical relation between e and e’, Therefore we can- 
not see immediately that the explicandum holds. However, the following meth- 
od can be used for an examination of this example; it is analogous to the method 
which led above to Tx but much more simple because of the simplicity of the 
example. We regard the sentences e, h, €’, and h’ as sentences of Q} (§ 31), and 
hence transform them into disjunctions of Q-sentences (D31-1d) with ‘Qy’,..., 
‘Qs’. These Q-sentences are the 3 in this system because N = 1 (D34-1a). 
Then we can easily show that for any regular m and c, the conditions (6), and 
hence (5), and hence (2) in § 80 are fulfilled. 

Third example. We combine the sentences of the preceding examples by 
taking as ¢ here the conjunction of the sentences ¢ in the first and the second 
examples, and likewise with h, ¢’, and W’. Thus e is here ‘P,a « (Pb V Pid) « 
(Pab V P3b)’; h is “(Pa V Pa) « P28’; e' is ‘Pua s Pab’; h is ‘Paa u Pb’. Here, 
neither the evidences are L-equivalent nor the hypotheses. But we see easily 
thatþea h De, th Dh, bed hV e’, and hence Dre; is fulfilled. 


Some Concepts Based on MC 


On the basis of MG, three other comparative concepts are defined, As MC 
corresponds to the relation = between c-values, Gq is defined (Dr) in such a 
way that it corresponds to the relation = (Tra), and @r (D2) corresponds to 
> (Trb). Finally, two pairs of sentences are called comparable if the relation 
MC holds between the first and the second or between the second and the 
first (D3). Some theorems are given which state sufficient and necessary condi- 
tions for the concepts defined in terms of L-concepts. 


We shall define here four concepts on the basis of the comparative con- 
cept of confirmation ME. These four concepts are, like MG, tetradic rela- 


tions between sen’ 


tences in any system &. Here, as in the case of MG, it 


is convenient to regard the concepts as relations between two pairs of 
sentences. In this sense, we use the term ‘the converse of ME’ (without 
a special symbol) for that relation which holds between two pairs h’,e’ 
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and h,e if and only if MG holds between the pairs k,e and h’,ec’, that is, 
between the same pairs in the opposite direction. 

The relation Gq will be defined in such a way that it holds between two 
pairs of sentences if and only if both ME and its converse hold between 
them (Dr). The symbol ‘Gq’ is intended to suggest ‘equal’; this is justified 
by Tra. The relation Gr is to hold if ME holds but not the converse of 
MC (D2). The symbol ‘Gr’ is to suggest ‘greater than’; it is, however, to 
be noticed that, if Gr holds between two pairs of sentences, the first pair 
does not necessarily have a greater c-value than the second for every 
regular c; in general, only a somewhat weaker statement holds (Tıb). We 


shall say that two pairs of sentences k,e and h’,e’ are comparable if the - 


relation MGE holds between these pairs in at least one direction (D3). 
Otherwise the pairs are said to be incomparable. In the latter case, no 
purely comparative relation holds between the pairs in the sense that no 
general relation between their c-values holds for all regular c-functions; 
some c-functions rate the one pair higher and other c-functions the other 
pair (Trd). 
+D82-1. Eq(h,e,h’,e’) (in a finite or infinite system £) = p: MC(h,e,h’,e’) 
and MC(h’,e’,h,e). 
+D82-2. Gr(z,2,h’,e’) (in® = pe MCl,e,h’,e’) and not MC(h' e’,i,e). 
+D82-3. The pairs of sentences h,e and h’,e’ are comparable (in £) = pi 
MC(h,e,k’,e’) or MC(h',e’,h,e) (or both). 
The following theorem T states relations between the comparative 
concepts just defined and the quantitative concepts, that is, the regular 
c-functions. This theorem is restricted to finite systems £y for the reasons 
mentioned earlier in connection with T81-1. However, if the four sentences 
involved are nongeneral and e and e’ are non-L-false in every system in 
which they occur, then the assertions of this theorem hold likewise for 
Ro (see T8r-2). 
+T82-1. Let h, e, h’, and e’ be sentences in Ly. Let e and e’ be non-L- 
false in Ly. 
a. Gq(h,e,h’,e’) if and only if, for every regular c with respect to lw, 
c(h,e) = c(h’,e’). (From D1, T81-1.) 

b. Gr(h,e,h’,e’) if and only if, for every regular c with respect to Ly, 
c(h,e) =c(h',e’) and, for at least one such c, c(h,e) >c(h’,e’). (From 
D2, T8r-r1.) 

c. The pairs k,e and h’ e’ are comparable if and only if either, for every 
regular c with respect to Qy, c(h,e) = c(h’,e’) or, for every such ¢, 
c(h',e’) = c(h,e). (From D3, T81-1.) 
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d. The pairs k,e and h’,e’ are incomparable if and only if there are two 
regular c-functions c and c’ with respect to Qy, such that c(h,e) > 
c(h’,e’) and c'(h’,e’) > c’(h,e). (From (c).) 


The remaining theorems of this section and the next one do not involve 
quantitative concepts, that is, c-functions, but state relations between the 
defined comparative concepts and the original L-concepts. All these theo- 
rems hold for any finite or infinite system £ which contains the sentences 
in question. 

The following theorem T3, which follows immediately from D81-1, 
states a sufficient and necessary condition for the converse of IG. It 
serves merely as a lemma for some later theorems. 


782-3. Lemma. IC(h’,e’,h,e) if and only if the following three condi- 
tions are fulfilled: 
a. e’ is not L-false. 
b. eis not L-false. 
c. Either (c) fe’ D fh’, 
or (c2) Fe D ~h, 
or (c;) Fesh D eak ande DW Ve. 


The following theorems T4 to T7 state sufficient and necessary condi- 
tions for the concepts introduced in this section. They refer not to ME 
but directly to the original L-concepts. (They could be taken as alterna- 
tive definitions for the new concepts on the basis of L-concepts.) 


+T82-4. Gq(h,e,k’,e) if and only if e and e’ are non-L-false and, in 
addition, at least one of the following three conditions is fulfilled: 
Either a. þe D hand}e’ D h’; 
orb. þe D ~hand e D ~h’; 
orc. þe =e andþe.hk =e’ oh’. 


Proof. x. Suppose that Ca(h,e,h',e'). Then (D1) both MC and its converse 
hold; hence the conditions both in D81-1 and in T3 are fulfilled. According to 
D8r-1a and b, which are the same as T3b and a, e and e’ are non-L-false. 
Further, the conditions under (c) in D81-1 and in T3 are fulfilled. Let us de- 
note the three items under (c) in D8r-1 by ‘cy’, ‘c,’, and ‘c,’, and those in T3 by 
tch, ‘cf’, and ‘cy’. Thus the following holds: (c: or c+ or c;) and (c; or c3 or c3). 
This can be transformed by distribution (T21-5m(3)) as follows: (c: and cy) or 
(c1 and cf) or (c: and c4) or (cand cf) or (cz and c3) or (ca and c3) or (c; and ci) 
or (c, and cy) or (c; and cs). Now we shall examine each of these nine disjunc- 
tive components. We must show for each of them either that one of the three 
conditions in our theorem follows from it, or that the component cannot hold 
under the assumption here made. (c: and c;) is the condition (a) in this theorem. 
(cr and c4) says that } e D ha ~h (T21-5m(8)); hence e would be L-false, which 
is impossible on our assumption. (c: and c3) says that} e 2 handļțesh D e'a h 
and Fe’ Dk Ve; from this it follows that |e’ D x V (e.h) (T21-5i(1)), 


\ 
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hence Fe’ D hk’ Vk, hence fe’ Dh’, hence (a). (ca and c;) says that fe’ D 
~h «h'; thus e’ would be L-false, which is not possible. (c+ and c4) is (b) in this 
theorem. (ca and cj) says that (x) |e’ D ~h' and (2) Fesh De’ ah’ anda 
third item; it follows from (x) that | ~(e’.h’) (T21-sf(3)); from this and 
(2) that | ~(e.h) (T21-gh), hence | e D ~h; thus (b) in this theorem fol- 
lows. (c, and cj) says that } e's W Dewhand}e D hV eand fe’ Dh’; hence 
it follows that }e D # V (e’ « h’) (T21-5i(1)); hence }e D hV h, hence} e D h; 
thus (a) follows. (c, and c4) says that }e’«h’ Desh and te DhVe and 
be D ~h; hence | e'a h’ D ~h ah, hence e’ wh’ is L-false, hence | e’ D ~h'; 
thus (b) follows. Finally, (c, and c;) says that (1) þe’. h’ Desh, (2) 
fe DhVe, (3) Feah De’ ak, and (4) fe’ D k Ve; from (2) it follows that 
fe D (e= h) Ve’, hence with (3) Fe De’ Ve’, hence fe D e’; and analogously 
from (4) and (1) that | e’ D e; therefore | e = e’; from this and (3) and (1), 
(c) in this theorem follows. 2. The converse says that, if the conditions stated 
in the theorem are fulfilled, €q holds, that is, that the conditions in both 
D81-1 and T3 are fulfilled. The conditions D81-1a and b and T3a and b, name- 
ly that e and e’ are non-L-false, are stated explicitly in the present theorem. 
Thus it remains to be shown that the conditions under (c) in D81-1 and T3 
are fulfilled. They may be abbreviated as above. If (a) in this theorem holds, 
then c: and cý hold, hence D81-1c and T3c hold. If (b) here holds, then ca 
and c; hold, hence again D81-1c and T3c hold. If (c) in this theorem holds, then 
c, and c; hold, thus again D81-1c and T3c hold. 


+T82-5. Gr(h,e,h',e’) if and only if the following four conditions are 
simultaneously fulfilled: 
a. Not }e D ~h. (Hence e is not L-false.) 
b. Not } e’ D A’. (Hence e’ is not L-false.) 
c. Either (c,) þe D k, 
or (c) fe’ D ~h, 
or (c3) fe’ sh’ D eshandpeDhVe. 
d. Either (d,) not þe. h D e.k’, 
or (d,) not fe’ Dh’ Ve. 


Proof. 1. Suppose that Gr(h,e,h’,e’). Then (D2) MC holds but not the con- 
verse of MC. Therefore the conditions of D81-1 are fulfilled but not those of T3. 
We have to show that the above conditions (a) to (d) are fulfilled. a. }e D ~h 
cannot hold because otherwise it would follow with D81-1c, that }e D ~h. k; 
hence e would be L-false, contrary to D8r-r1a. b. Analogously, | e’ D k’ cannot 
hold because of D8r-1c, and D81-1b. c. From D81-1c. d. Suppose that (d) did 
not hold. Then its negation would hold, which is the same as T3c3; hence T3¢ 
would hold. From (a) and (b), which we have just derived, it follows that e and 
e' are not L-false; hence T3a and b hold. Thus the conditions of T3 would be 
fulfilled, contrary to our assumption. Therefore (d) must hold. 2. Suppose that 
the four conditions (a) to (d) are fulfilled. We have to show that MG holds but 
not its converse, in other words, that all conditions of D81-1 are fulfilled but not 
all conditions of T3. D81-1a and b follow from (a) and’ (b) here. D8r-1c 18 
the same as (c) here. Now we shall show that T3c is not fulfilled. This means 
that the three conditions (c:), (c4), and (c,) in T3 are all violated. For (c+), 
this follows from (b) here; for (c.), from (a) here; for (c;), from (d) here. 
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Examples for T5. Or holds in the three examples given at the end of § 81. 
There we have seen that D8r-1¢; is fulfilled, and this is the same as Tsc; here. 
We can further easily see that T5a, b, and d; are fulfilled in these examples. In 
the first example, | e D e’, but not e «h D k’. In the second example, } h D h’, 
but not fe. 4 De’. In the third example, neither }e.4 De’ nor Feah Di’. 
Thus d; is always fulfilled. 

T82-6. The pairs h,e and h’,e’ are comparable if and only if e and e’ are 
not-L-false and, in addition, at least one of the following six conditions 
is fulfilled: 

Either a. fe D h; 

or b. pe D ~h; ~ 
ore. pesh De.handpeDhVe; 
ord. fe’ D k’; 
ore. þe D ~h; 
orf. pesh D e.h andpe Dh Ve. 
(From D3, D81-1, T3.) 3 

+182-7. The pairs h,e and h’,e’ are incomparable if either e is L-false 
or e’ is L-false or the following six conditions are fulfilled simultaneously: 

a. Not þe D h. 

b. Not fe’ D ~k. 

c. Either not te’.h’ D e.h, or not te D AV e. 

d. Note’ D K. 

e. Not þe D ~h. 

. Either not þe. h D e.k’, or not fe’ D hi’ Ve. 
(From T6.) 


rh 


§ 83. Further "Theorems on Comparative Concepts 


A. Some further theorems are given concerning the concepts MC, Eq, Gr, 
and comparability, defined in the two preceding sections. B. It is shown that all 
axioms of B. O. Koopman’s axiom system for “intuitive probability” are here 
provable as theorems on MC. 


A. Theorems 


This section contains some further theorems concerning ME and the 
three other comparative concepts introduced in the preceding section. 
Most of these theorems, like those in the preceding section, connect these 
concepts with L-concepts; thus they build a bridge between comparative 
inductive logic and deductive logic. Other theorems attribute to the com- 
parative concepts (as relations between pairs of sentences) relational 
characteristics like reflexivity or transitivity. All theorems of'this section 


hold for any finite or infinite system £. 
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183-1. Let |e D e’. MC(h,e,h’,e’) if and only if e is non-L-false and 
either fe D horfe’.h’ D e.h. 
Proof. From D8r-1. Of the conditions in this definition, only (a), (c:), and 
the first part of (c;) are here stated explicitly. (b) and the second part of (c;) 
are here omitted (T21-sp(2)) because they follow from the assumption that 
Fe De’ and (a). (ca) is omitted (T21-sp(1)) because the stated first part of (c;) 
follows from it (T21-sg(z)). 


The following theorem states the invariance of ME with respect to re- 
placements of arguments by any L-equivalent ones. The same holds ob- 
viously for Œq and Gr since they are defined in terms of ME (D82-1 and 2). 


T83-2. Let h, be L-equivalent to k+, likewise e, to e2, k4 to h’,, and ef to 
e;. Then MC(hz,¢:,h%,¢;) if and only if MC(h,,e2,h;,¢,). (From D81-1.) 


83-3. Let h be L-equivalent to h’, and likewise e to e’, and e be non-L- 
false. 

a. MC(h,e,h’,e’). (From D81-1, since (c;) is fulfilled.) 

b. Gq(h,e,h’,e’). (From (a).) 


The following theorem says that MGE and Gq are reflexive relations 
(between pairs of sentences). 


783-4, Let e be non-L-false, and / be any sentence. 
a. MC(h,e,h,e). (From T3.) 
b. Gq(h,e,h,e). (From (a).) 


Tt seems that scientists use comparative concepts of confirmation most- 
ly in cases of two special kinds: (1) two hypotheses are compared on the 
same evidence, or (2) two evidences are compared with respect to the 
same hypothesis. Therefore we shall now state some theorems (T6 to 
T13) for the case that the same evidence occurs in both pairs. Because of 
T2, analogous theorems hold, of course, when different but L-equivalent 
evidences e and e’ occur. Later theorems will deal with the case that the 
same hypothesis occurs in both pairs. 


+T83-6. NC(h,e,h’,e) if and only if the following two conditions are 
fulfilled. 

a. eis not L-false. 

b. peak D k. 


Proof. x. Suppose that MG(h,e,h’,e). Then (a) follows from D8r-ra, and (b) 
from D8r-re;. 2. Let (a) and (b) be fulfilled. We have to show that the condi- 
tions in D81-1 are fulfilled with ‘e’ for ʻe”. D81-1a and b correspond to (a) here. 
(b) here yields } e. h’ D e «h; further }e D h Ve (T21-4a); thus D8r-rc; fol- 
lows, hence D81-1c. 
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The following theorem T7 is analogous to T6 but deals with the con- 
verse of ME. It is merely a lemma for subsequent theorems. 

T83-7. Lemma. IC(h’,¢,h,e) if and only if the following two conditions 
are fulfilled. 

a, eis not L-false. 

b. pesh Dk. 
(From T6.) 


The following is a corollary to T6. 


783-8. Let e be non-L-false, h and i arbitrary. 

a. MC(h,e,h » i,e). 

b. MEC V i,e,h,e). 
(From T6.) 3 

(a) says that if we join a conjunctive component to a given hypothesis, 
then the confirmation is at most as high as before, (b) says that if we join 
a disjunctive component to a hypothesis, the confirmation is at least as 


high as before. y 

+-T83-9. Cq(h,e,h',e) if and only if the following three conditions are 
fulfilled. 

a. e is not L-false. 


b. Feah Dh. ; 
c. } e.h D Ai’. ((b) and (c) may be combined as follows:} e D (h=h’)). 


(From T6, T7.) 
: The following is a corollary to Tọ. 

T83-10. If e is non-L-false and þe.h Di, then Gq(h . i,e,h,e). 
(From T9.) 

+T83-11 Gr(h,e,h’,e) if and only if the following two conditions are 
fulfilled. 

a. hesh DRX 

b. Not }e.h D h’. (Hence e . h and e are not L-false.) (From T6, T7.) 

iously mentioned in § 81). e is ‘Psa’; h is ‘Paa \ P0’; h is 
ea T NEUE A, According to T11, Gr(h,e,h',e). 


T83-12. The pairs k,e and h’,¢ are comparable if and only if the follow- 


ing two conditions are fulfilled. 

a. eis not L-false. 

b. Either þe. h’ D horpesh 2 h. 
(From T6, T7.) 
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183-13. The pairs h,e and h’,e are incomparable if and only if either e 
is L-false or the following two conditions are fulfilled simultaneously: 

a. Not}e.h’ D h. 

b. Not}e.k D k. 
(From T12.) 


Tx2 and Tr3 say in effect that on the basis of a given (non-L-false) evi- 
dence, two hypotheses can be compared only if one of them follows from 
the other together with the evidence. This seems quite plausible; because 
otherwise the ranges of e . k and of e . h’ have nonoverlapping parts, and 
therefore we can, by the choice of suitable measure functions, give at will 
a greater measure either to the one or to the other. 

The following theorems T15 to T18 deal with the case that the same 
hypothesis occurs in both pairs. Analogous theorems hold, of course, when 
different but L-equivalent hypotheses 4 and h’ occur. 

+T83-15. MGC(%,e,h,e') if and only if e and e’ are non-L-false and, in 
addition, at least one of the following three conditions is fulfilled: 

Either a. ļe D h; 

or b. fe’ D ~h; 
orc. }e skh DeandbeDdDhVe’. 
(From D81-r.) 


The condition (c) says the following about the three sentences h, e’, 
and ~e; both their disjunction and the disjunction of their negations are 
L-true; in other words, it is logically necessary that at least one of the 
three sentences is true and at least one is false. 

We found earlier (T7) that, on the basis of the same evidence, a stronger 
hypothesis is confirmed at most as much as a weaker one, e.g., k.t in. 
comparison to + (T8a) or k in comparison to 4 Vi (T8b). There are no 
simple analogues for the comparison of two evidences with respect to the 
same hypothesis. e.i may give either more or less support to h than e, 
depending on whether the additional evidence ż is positively or negatively 
relevant to h. 


83-16. €q(h,e,h,e’) if and only if e and e’ are non-L-false and, in addi- 
tion, at least one of the following three conditions is fulfilled: 
Either a. þe D hand} e’ D h; 
orb. pe D ~hkand}e’ D ~h; 
orc. fe =e’. 
(From T82-4.) 


183-17. Gr(h,e,h,e’) if and only if the following four conditions are 
fulfilled simultaneously. 
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a. Not }e D ~h. (Hence e is not L-false.) 

b. Not } e’ D h. (Hence é’ is not L-false.) 

c. Either (c:) Fe D h, 

or (ca) Fe D ~h, - 
or (c;) Fesh D eandteD hV e. 
d. Either (d,) not þes k D e', 
or (da) not He’ DAVe. 
(From T82-5.) 

(The condition (c;) here is the same as TI5c; see the remark made 

there.) 
rr an example for T17, see the second of the three examples given at the end 
of § 81. 

783-18. The pairs k,e and h,e' are comparable if and only if e and e’ are 
non-L-false and, in addition, at least one of the following six conditions is 
fulfilled: 

Either a. }e D h; 

or b. fe’ D ~h; 

orc. þe .hD eandteD hve; 

ord. fe! D h; 

ore. þe D ~h; 

orf. }eek De and}e’ D hVe. 
(From T82-6.) 

The customary concepts of the logic of relations usually applied to 
dyadic relations (see D2 5-2) may be applied to our present comparative 
concepts if we regard the latter as dyadic relations between pairs of sen- 
tences, as earlier explained. In this sense, MG is reflexive (T4a) and transi- 
tive (T22). This seems plausible since MC is analogous to the relation 2. 


783-22. If MC(h,e,h',e') and mG(h',e’,h’’,e'”), then MC(h,e,k”,e”). 


Proof. This theorem can easily be proved by two applications of T81-1a and 
one of T8r-rb, referring to c-functions. However, we shall give here a proof 
which is based directly on D81-1 and does not involve any quantitative con- 
cept but stays within the boundaries of comparative inductive logic. Let the 
two conditions be fulfilled; we call them (x) and (2). Then e and e” are not L- 
false (D81-1a and b). We have moreover to prove that the condition D81-1¢ is 
fulfilled for the assertion, that is, that either (c:) } e D h, or (ca) He” D ~h", 
or (c;) fe” ak” Desh and Fe DA Ve". Suppose that (cx) and (c,) are not 
fulfilled; we have to show that then (c,) is fulfilled. It follows from condition (1), 
according to D8z-1c, that either (1A) }e D k or (xB) be! D ~# or (1C) both 
(C3) fe’ eh’ D eah and (1C2) Fe Dh Ve. Likewise, it follows from (2) that 
either (2A) |e’ D W or (2B) fe” D ~h" or (2C) both GC) Fe” ak Deak’ 
and (2C.) fe’ D hi \ e”. (1A) is the same as (c1) and hence is not fulfilled, ac- 
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cording to our assumption; likewise (2B) since it is the same as (c:). Therefore 
the following holds: (x’) either (1B) or (1C); and (2’) either (2A) or (2C); 
hence also (3') either (2A) or (2C;). Now it can be shown that (1B) is not 
fulfilled. For if it were, it would hold, according to (3’), either together with 
(2A) or together with (2C,). However, in the first case, e’ would be L-false, 
which it is not. And in the second case, } ~(e’.h’) (T21-5f(3)), hence with 
(2C) | ~(e” a k”) (T21-5h(1)), hence | e” D ~h”, which is not the case since, 
according to our assumption, (C4) is not fulfilled. Thus from (1') we obtain (1C), 
i.e., both (1C,) and (1C). It can further be shown that (2A) is not fulfilled. 
For, from this together with (1C,) it would follow that | e’ D h (T21-5i(1)), 
and hence with (1C,) }e D A, which is not the case since (c,) is not fulfilled. 
Thus from (2’) we obtain (2C), i.e., both (2C;) and (2C.). Hence our result is: 
(xC,) and (1C,) and (2C,) and (2C,). It follows from (2C;) and (1C,) that 
be’ ah” Deak; this is the first part of the condition (c,) mentioned above. 
From (1C,) it follows that | h’.e’ D h; from (2C.) that e’ D (h'.e’) Ve”. 
From these two results we see that } e’ D k Ve”. From this and (1C.) we find 
that łe DAV (k Ve"), hence} e D kV e”. Thisis the second part of (c;). Thus 
it has been proved that (c,) is fulfilled. 


Gq, again with respect to pairs of sentences as members, is reflexive 
(T4b), symmetrical (T23), and transitive (T24). Thus it has the structural 
properties of an equality relation. This seems plausible since Gq corre- 
sponds in a certain sense to equality of confirmation, as we have seen 
earlier (T82-1a). 

83-23, If €q(h,e,h’,e’), then Eq(h’,e’,h,e). (From D82-1.) 

T83-24. If Gq(h,e,k’,e’) and Gq(h’,e’,k”,e”’), then Gq(h,e,h’’,e"”). (From 
D82-1, T22.) $ 

The relation Gr is asymmetrical (T26), irreflexive (T27), and transitive 
(T29). This is plausible since Gr is in a certain sense analogous to the 
relation >. 

783-26. If Gr(h,e,h’,e’), then not Gr(h',e’,h,e). (From D82-2.) 

T83-27. Not Gr(h,e,k,e). (From T26.) 

T83-28. 

a. If Grih,e,h’,e’) and MC(h’,e,k’’,e’’), then Gr(h,e,k’’,e”’). 

b. If MC(h,e,h’,e’) and Or(k',e',h” e"), then Gr(h,e,h” e"). 

(From D82-2, T22.) 

T83-29. If Gr(h,e,h’,e’) and Gr(h’,e’,k’’,e”), then Gr(h,e,h” e"). (From 

D82-2, T28a.) 


B. Koopman’s Axiom System 


B. O. Koopman [Axioms] has constructed an axiom system with one 
primitive concept, which is explained as follows: “a on the presumption 
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that / is true is equally or less probable than b on the presumption that k 
is true”. This relation is obviously the converse of that which we have 
taken as explicandum (see (1) in § 80) and for which we have proposed 
the concept MG as an explicatum. Therefore it is interesting to examine 
whether the converse of MG fulfils Koopman’s axioms. The following 
theorems together with two earlier ones show that this is indeed the case. 
Koopman’s axiom V, if reformulated by taking instead of his primitive 
relation the converse of MG, is our subsequent theorem T31; likewise, his 
axiom I is T32; axiom R is T4a; T is T22; A is T33; C: is T34; C2 is T35; 
D is T36; P is T37b; axiom S is a special case of T38c for special argu- 
ments / and e. 

783-31. Let e and e' be non-L-false. Then for any h’ MG(e,e,h’,c’). 
(From D8r-ra, b, cr.) 


783-32. If MC(h,e,e’,e), then |e D h. 


Proof. Let the condition be fulfilled. Then (D81-1) either (c:) fed h, or 
(c) ke’ D ~e’, or (c3): (1) be’ Desk and (2) te DAV e. (cr) is the assertion. 
(ca) is impossible because e’ would be L-false, in contradiction to D81-1b. In 
the case (c,), fe’ D h (1) and hence fe’ = e' «h (T20-21); therefore} e D h V 
(e! a h) (2); hence | e D h (T21-5p(1)). 


T83-33. If MC(h,e,h’,e’), then MC(~h',e’,~h,e) - 

Proof. Let the condition be fulfilled. Then (D81-rc) either (c:) Fe D k, or 
(ca) HE mh’, or (c3): (2) Fe'e K D e a h (2) fe D k Ve. In the case (c;) the 
assertion holds (D81-rc:); likewise in the case (ca) (D81-1c:). In the case (c;), 
the following assertions (3), (4), (5), and (6) hold. (3) ke» ~h 2 e' (from (2), 
T21-5h(7)). From (1): Fesh’ D h; hence (4) Fea ~h D oh (Tar-sh(6)). 
From (3) and (4): (5) } e « ~h 2 e! a ~h' (T21-5m(8)). From (1): F e'e k De, 
hence (6) fe’ D ~h' Ve (T21-sh(8)). T! herefore the assertion holds (from (5), 

(6), D8x-re3). : 
T83-34. Let e. h.j and e'«h'.j' be non-L-false. If (1) Dt(,e,h’,e") 

M4 ot of 
and (2) MG(j,e«/,j’,¢ « k’), then MCh ajek ajne). 

z e conditions be fulfilled. Condition (1) says (D81-1c) that either 
reo ner (tb) fe’ D ~k’, or (4c) both (rex) He's W Deak and (rca) 
fe Dh Vel. Condition (2) says that either (2a) fesh Dj, or (2b) ke „h a) 
~j’, or (2c) both (2c:) fe’ eh’ sj’ Deshaj and (2c,) e.h DIV +H). 
(ab) is impossible because eh’ would be L-false, and hence e sh! «j’ too; 
similarly, (2b) is impossible. Thus there temain four combinations of a case 
(x) with a case (2). The assertion says that either (Ar) be 2 haj, or (A2) 
Fe D ~(h' ej’) (which is impossible), or (A3) both (A3a) be’ ah’ oj’ Deshaj 
and (A3b) fe D (#4) Ve’. We have to show that in each of the four cases 
either (Ar) or (A3) holds. I: From (za) and (2a), (A1) follows. II: (za) and (2c). 
(2c:) is (Aga). (A3b) follows from (2c:) and (za). ITI: (1c) and (za). (Aga) fol- 
lows from (1c:) and (2a). (A3b) follows from (1c.) and (2a). IV: (tc) and (2c). 
(2c:) is (Aza). (A3b) follows from (1¢;) and (2c;). Thus the assertion is proved. 
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783-35. Let ¢.h.jande’.k' .7’ be non-L-false. If (1) MC(j,e . h,h’,e’) 
and (2) MC€(h,e,j’,e’ »h’), then MC(h . jrek’ «7’,e’). 


Proof, similar to the preceding theorem. (1) says that either (1a) þe. 4 Dj, 
or (1b) | e’ D ~h’, or (1c) both (1c,) F e’ sh’ Dewhsjand (1c:)}ewk Dj Ve. 
(2) says that either (2a) }e D k, or (2b) fe’. k’ D ~j’, or (2c) both (2c:) 
fe’ sh’ oj’ Desh and (20) Fè DAV (e’ eh’). (1b) and (2b) are impossible. 
Thus four combinations remain. The assertion is the same as in T34; see there 
for (Ar), etc. I: From (1a) and (2a), (A1) follows. II: (1a) and (2a). Here (Aga) 
follows from (2c:) and (ra); (A3b) follows from (2c,) and (1a). IIT: (1c) and 
(2a). Here (Aja) follows from (1c,); (A3b) follows from (2a) and (rc,). IV: (1c) 
and (2c). Here (Aga) follows from (rc,); (A3b) follows from (2c) and (rca). 


T83-36. Let e.hk.j and e’.h’.j’ be non-L-false, and let 
MCh .j,e,h’ .j’,e’). Consider the following two pairs of sentences: 

(i) he’; je e E 

(ii) h,e; jesh. 
If MC holds between either pair in (i) and either pair in (ii), then MC 
holds likewise between the remaining pair in (ii) and the remaining pair 
in (i). 

Proof. This theorem is a combination of four statements. We shall give the 
proof for one; the proofs for the three others are similar. The one statement is 
this: ‘If (2) MG(h',e’,h,e), then (A) MC(j,e - h,j',e' h’)’. Let (1) be the MG- 
condition mentioned in the first sentence of the theorem; it says that either (1a) 
fe Dhaj, or (1b) fe’ D ~(h' aj’), or (1c) both (1c:) fe’ wh’ aj’ Dewhaj 
and (1c:) e D (haj) Ve’. (2) says that either (2a) | e’ D k’, or (2b) pe D ~h, 
or (2c) both (2c:) Fesh De’ eh’ and (2c:) fe’ D W Ve. (1b) and (2b) are im- 
possible. The assertion (A) says that either (A1) }e. k Dj, or (A2) ke’ kh’ D 
~j’ (which is impossible), or (A3) both (Aga) | e’ » W «j’ D e. h. jand (A3b) 
Fesh Dj V (e. h’). 1: From (1a), (Ar) follows. II: (rc) and (2a). (Aga) is the 
same as (1c:); (A3b) follows from (1c.) and (2a); hence (A3) follows. III: (1c) 
and (2c). (Aga) is (1c:); (A3b) follows from (2c;). 


"183-37. Let MG(h,e,h',e' . i). 
a. If MC(h,e,h’,e’ «7), then MC(h,e,h’,e’. GV j). 


Proof. Let the initial assumption be (1), the condition in (a) be (2), and the 
assertion in (a) be (A). (1) says that either (1a) | e D k, or (1b) pe'i D W, 
or (1c) both (1c:) fe’ sh’ «i Desh and (1c:) Fe DAV (e ai). (2) says that 
either (2a) } e D h, or (2b) Fe’ «jf D ~h’, or (2c) both (2c:) fe’ ah’ af Desh 
and (2c:) Fe D hV (e’ «j). (A) says that (Ar) Fe D A, or (A2) e’. GVA) D 
~k, or (A3) both (Aga) | e’ «k's (i Vj) D e.hand (A3b) Fe D h V (e'a (i V3))- 
(A2) says that both (Aza) }e'.i D ~k and (A2b) Fe’ «7 D ~k (T21-5n(3)). 
(Aga) says that both (A3a,) |e’ sh’ si Desh and (A3a:) fe’ eh’ aj Desh 
(T21-5n(3)). (ta), (2a), and (Az) are the same. Thus four cases remain. I: (tb) 
and (2b). They are the same as (Aza) and (Azb) respectively; hence (A2) fol- 
lows. II: (1b) and (2c). (A3a:) follows from (rb). (Agaz) is (2c:). (A3b) follows 
from (2c.). Hence (A3) follows. III: (rc) and (2b). (A3a:) is (1c1). (A3aa) fol- 
lows from (2b). (A3b) follows from (rc.). Hence (A3) follows. IV: (1c) and (2c). 
(A3a:) is (1cx). (Aga) is (2c). (Agb) follows from (1c.). Hence (A3) follows. 
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b. If MC(h,e,h’,e’ . ~i), then MC(h,e,h',e’). 
(From (a), with ‘~i’ for F.) 

183-38. Letj,,..., ja (n = 2) fulfil the following conditions. (1) These 
sentences are L-exclusive in pairs. (2) j, that is, j: V...Vj,, is non-L- 
false (hence j;, . . . , jn are non-L-false). (3) For every p from 1 ton — 1, 
MC(jp+17 Joy). Then the following holds. 

a. For every p from 1 to n — 1, } Jp D Jn. 

Proof. For every p, the following holds. |j =j» D jp}: (from (3), T83-6b). 
j -jp is L-equivalent to (jı V- «V Jn) «jp, hence to (jx eja) V..-V Gn eja), 
hence to jp «Jp, because all other conjunctions in this disjunction are L-false 
(from (x), T21-5r(2)); hence to jp. Therefore | jp Djr+» Hence the assertion 
(with T20-2b, by mathematical induction). 

b. FD jn 

Proof. For every p from 1 to n, jp D Jn (from (a) and | jn D jn). Therefore 
ta: V... Via D ja (T21-50(4)); this is the assertion. 

c. For any h and non-L-false e, MC(jn,J,4,e). (From (b), D81-r0;.) 

Since Koopman’s axioms hold in the present system of comparative 
confirmation, the theorems which he derives from the axioms hold like- 
wise. However, with respect to the nature and function of the theory, 
there are some differences between the conception presented here and 
that of Koopman. He believes that the theory can only supply conditional 
statements. concerning the comparative concept of confirmation; direct 
comparative statements of the form “2 is confirmed by e at least as strongly 
as h' by e” are not supplied by his theory. Statements of this kind cannot, 
in his opinion, be obtained with the help of any general principle, be it a 
principle of probability, of logic, or of experimental science; they can be 
obtained only by intuition. The results of this special kind of intuition 
seem to be regarded as not subject to rational examination (except for 
questions of consistency) and therefore not capable of rational reconstruc- 
tion. This view is similar to, and probably influenced by, Keynes’s con- 
ception of probability as undefinable and based on intuition. In contrast 
to Koopman’s view, I am convinced that it is possible to give a rational 


reconstruction or explication for the comparative concept of confirmation, 


and I believe, moreover, that it is possible to define an explicatum with- 
out using any other terms than those of deductive logic,-hence, of seman- 
tics. The concept MGE is here proposed merely as a tentative explicatum. 
Tf it is found not to be quite adequate, it will be replaced by a more ade- 
„quate explicatum. But I fail to see at present any reasons for the view 
that it shéftid be impossible in principle to construct an adequate expli- 


catum. 
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Two more concepts are introduced into comparative inductive logic, express- 
ing maximum and minimum confirmation, respectively. ‘Mar(z,e)’ is the com- 
parative analogue to the quantitative statement ‘c(k,e) = 1’ (for all c-func- 
tions); ‘Min(h,e)’ is the analogue to ‘c(#,e) = o’. The definitions of these two 
concepts (D1, D2) do not, however, refer to c-functions but only to L-concepts. 

We shall here introduce two more concepts into comparative inductive 
logic. They are L-semantical concepts like ME and the other concepts 
earlier defined; in distinction to those concepts, they are relations be- 
tween two sentences, not four. They are intended to express maximum 
and minimum confirmation, respectively, of a hypothesis % on an evi- 
dence e. More exactly speaking, their connection with the quantitative 
c-concepts is intended to be as follows. We shall say that % has the maxi- 
mum confirmation on evidence e, in symbols of the metalanguage: 
‘Mar(h,e)’, if and only if, for every regular c, c(%,e) has the maximum value, 
that is, 1. Analogously, we shall say that % has the minimum confirmation 
on evidence e, in symbols ‘Min(k,e)’, if and only if, for every regular c, 
c(h,e) has the minimum value, that is, o. The reference to all regular 
c-functions here is analogous to that in the requirements R8o-1 and 2 for 


MC. Here, as there, these conditions referring to quantitative concepts — 


are merely meant as requirements of adequacy but are not taken as defini- 

tions. The definitions will here again be purely comparative; it will then 

be shown later that these comparative concepts fulfil the quantitative 
- requirements just stated (T4). 

In terms of ING, the two new concepts are intended to fulfil the follow- 
ing requirements, which obviously correspond to the quantitative re- 
quirements stated above. Mar(#,e) is to hold if and only if the pair 
of sentences k,e bears the relation MG to every pair h’,e’ (where e’ is not 
L-false). Analogously, Min(%,e) is to hold if and only if every pair h’,e’ 
(where e’ is not L-false) bears the relation ME to k,e. Since these condi- 
tions are in purely comparative terms, we could take them as comparative 
definitions for ‘May’ and ‘Min’. Instead, we shall define these two pred- 
icates, as we did with ‘ING’, directly on the basis of the old L-concepts 
(D1, D2); these definitions hold for both finite and infinite systems £. It 
will then be shown that the two concepts thus defined fulfil the require- 
ments just stated in terms of MGE (T2, T3). 

+D84-1. Mar(4,e) (with respect to a system £) =p; the following two 
conditions are fulfilled (in £): 

a. eis not L-false. 

b. fed k. 
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+D84-2. Min(h,e) (with respect to a system £) = ns the following two 
conditions are fulfilled (in 2): 

a. eis not L-false. 

b. pe D ~h. 


The following theorems Tr to T3 hold for any finite or infinite system &. 


T84-1. Lemma. 

a. Mar(h,e) does not hold if and only if either e is L-false or not } e D h. 
(From Dı.) 

b. Min(h,e) does not hold if and only if either e is L-false or not | e D 
~h. (From D2.) ; ; i 

c. Neither Mar(4,e) nor Min(h,e) if and only if either e is L-false, or 
neither |e D h nor }e D ~h. (From (a), (b).) 


+T84-2. Mar(h,e) if and only if, for all sentences h’ and e’ (in the 
system £ in question), where e’ is not L-false, MC(h,e,h’,e’). 

Proof. 1. Suppose that Mag(h,e) and that e'is not L-false. Then MG(h,¢,h',e’) 
(Dr, D81-1c:.) 2. Suppose that ME(h,e,h',e') for every h’ and e’, where e’ is not 
L-false. Then e is not L-false (D81-1a). Now we take for both e’ and h’ the 
tautological sentence ‘’. Then in D81-x1 (c;) is impossible; hence either (c) or 
(c;) holds, The first part of (c;) says that } £ D e. h, hence | h, hence F e D k. 

This is likewise stated by (c;); therefore it must hold. Hence Mar(h,e) (D1). 


+T84-3. Min(h’,e’) if and only if, for all sentences h and e (in 8), 
where e is not L-false, M(h,e,h’,e’). 
Proof. Suppose that Min(H’,e’) and that e is not L-false. Then MC(h,e,h',e’) 
(D2, D8r-rc,). 2. Suppose that ME(h,e,k',e') for every h and e, where e is not 
L-false. Then e’ is not L-false (D81-1b). Now we take for ‘h’ ‘~#’, and for ‘e’ ‘’, 
Then in D8r-t (c;) is impossible; hence either (c4) or (c;) holds. From the first 
part of (c,) it follows that |e’ «h’ D ~t, hence be’ D ~#’. This is likewise 
stated by (c;); therefore it must hold. Hence Min(h',e’) (D2). 
The following theorems T4a and b state the connections between the 
two new concepts and the c-functions. They show that the new concepts 
fulfil the quantitative requirements laid down earlier. 


+784-4. Let k and e be any sentences in a finite system fy or nongen- 


eral sentences in fo. 
a. Mar(h,e) if and only if, for every regular c, c(k,e) = 1. (From Dx, 
T59-1b; T59-5b.) 
b. Min(h,e) if and only if, for every regular c, c(h,e) = o. (From D2, 


T59-1e; Ts9-5¢.) 
Both (a) and (b) hold likewise with ‘for some’ instead of ‘for every’. 
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§ 85. Correspondence between Comparative and Quantitative Theorems 


Further theorems concerning the comparative concepts are stated. Some 
theorems in this and earlier sections of this chapter correspond to certain 
theorems on regular c-functions in the sense that MG, Gr, and Gq correspond 
to the relations 2, >, and =, respectively, between c-values, and Mar and 
Min correspond to the c-values 1 and o, respectively. 


Some further purely comparative theorems will here be stated. They 
involve the comparative concepts (ME, Eq, Gr, comparability, Mar, Nin) 
but no c-functions or other quantitative concepts. These theorems hold 
for any finite or infinite system &. 

The theorems on regular c-functions which have been stated in chap- 
ter v are of two different kinds. For some of them it is essential that the 
c-functions are quantitative, that is to say, that they have numerical 
values. This holds for all those theorems which refer to an arithmetical 
function (sum, difference, product, quotient, and the like) of values of 
c-functions (e.g., T59-1k, l, m, n, p, T59-2a, b, g, k, T59-3a, T60-1, T60-2, 
T60-3, T60-5, T60-6, T61-1b, c, d, T6r1-3a, b, c, d, T61-s5a, b, T61-6a, 
b, f, T61-7). However, there are other theorems which treat of c-functions 
in a comparative way, so to speak. They say, for instance, that a certain 
c-value is equal to another or that it is greater than another. To this kind 
belong also those theorems which ascribe the c-value 1 (because this is the 
same as saying that the c-value in question is higher than or equal to 
every c-value) or the c-value o. To some of these quantitative theorems 
with a merely comparative content we find corresponding theorems here 
in comparative inductive logic. It seems plausible that to the relations 2, 


>, and = between c-values, the comparative relations MC, Gr, and Gq, ; 


respectively, correspond; to a statement ascribing the c-value 1 or o, 
there is here a corresponding statement attributing the relation Mar or 
Min, respectively. The following are examples of this correspondence be- 
tween some theorems of § 59 and certain items (theorems or definitions) 
of the present chapter (earlier sections or this section): as a comparative 
analogue to the quantitative T59-1b and sb we find here D84-1; to T59-1¢ 
—T85-2a; to T59-1e and 5c—D84-2; to T59-1f—T85-2b; to T59-1h andi 
—T83-3b; to T59-2d—T83-6; to T'59-2e—T83-8a; to T59-2f—T83- -8b; to 
T59-2h—T83- -10; to T59-2j—T83-9. The analogues of theorems in’ § 61 
(confirmation of a hypothesis by a predictable observation) are especially 
interesting; they will occur in this section, the correspondence being in- 
dicated by remarks in square brackets. 

It is to be noted that this correspondence between some quantitative 
and comparative theorems is not always a simple translation. Often a 
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comparative theorem is weaker than the quantitative theorem to which 
it corresponds. Consider, for instance, a quantitative theorem of the form: 


(1) ‘For every regular c, if c(B;) = c($.), then c(P;) = (P.V, 


where Pr, Ba, Pa, and P, are four pairs of sentences for which’ certain 
logical relations hold. The corresponding comparative theorem will 
then be: 

(2) ‘IE MC(P,,B2), then MC(P;,B,)’ - 


If (2) is translated in terms of ¢-functions (according to T81-1), it says 
merely this: 
(3) ‘Tf, for every regular c, c($:) = c(%Bz), then, for every 
regular c, c(P;) 2 ($a) 


That this is weaker than the quantitative theorem (1) is easily seen by the 
following consideration. Suppose we find a certain regular c-function such 
that c(B.) = c(B.), whereas for other c-functions this does not hold. Then 
we can apply the quantitative theorem (x); it yields the result that for 
the given c, c(B;) = ¢(,). On the other hand, (3) does not say anything 
about this case. 

The correspondence described is more complicated for theorems con- 
cerning Gr than for those concerning ME or Eq. This is seen from T82-1b 
in comparison with T81-1 and T82-1a. 

Some of our comparative theorems correspond to quantitative theo- 
rems which have already been stated in the classical theory of probability 
or in modern systems by other authors. Among them are the interesting 
theorems stated by Hosiasson as Theorems (f+), (f2), (£5), and (f4) in [Con- 
firmation]. It is noteworthy that our corresponding comparative theorems 
(T8s-4c, 5c, 10b, and 11d) are proved within a purely comparative theory 
or, in other words, simply in L-semantics, while Hosiasson’s theorems are 
proved on the basis of her quantitative axioms. 


T865-1. 
a. If Mar(h,e) and Mar(h’,e’), then Eq(h,e,h’,e’). (From T84-2.) 
b. If Min(4,e) and Min(h’,e’), then Gq(h,e,h',e’). (From T84-3.) 


785-2. Let e be not L-false. 3 
a. If k is L-true, then Mar(h,e). (From D84-1 .) [This theorem corre- 


sponds to the quantitative theorem Tsg-1c in the sense explained 


` above.] 
b. If k is L-false, then Min(h,e). (From D84-2.) [Corresponds to 


Tso-1f.] 
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T85-3. Let MC(h,e,h’,e’). 
a. If Mar(h’,e’), then Mar(h,e). 

Proof. The condition implies that } e’ D W (D84-1). Because MG holds, 
according to D81-1 either (c:) | ¢ D h; or (c;) He's W Dewhand te DhVe, 
hence with the former result } e D k V (e' a h’), hence f e DAV h, hencebe D k. 
The case (ca) in D81-1 is here impossible; for it would imply that | e’ D ~#', 
hence, since } e’ D K’, e’ would be L-false, in contradiction to D8r1-1b. Thus in 
any case | e Dh. From this together with D8r-ra, it follows that Dtar(h,e). 


b. If Min(/,e), then Min(h’,e’). 


Proof. The condition implies that | e D ~h. Since ME holds, according to 
D81-1 either (c:) |e’ D ~h’; or (c;) fe’ ah’ Dewh, hence, since fe D ~h, 
fe’ sh’ D ~h. h, hence f e’ D ~h’. The case (c;) in D81-1 is here impossible; 
for it would imply that fe D k, hence, since |e D ~h, e would be L-false, in 
contradiction to D81-1a. Thus in any case | e’ D ~h’. Hence, with D81-1b, 
Min(h’,e’). 


T85-4. 

a. If Mar(h . i,e), then Mar(h,e). 

b. If Mar(h,e), then Mar(h V i,e). 

c. Let e.i be non-L-false. If Mar(k,e), then Mar(h,e . i). [Corresponds 
to T61-3k.] 
(From D84-1.) 


T85-5. 

a. If Min(k,e), then Min(h . i,e). 

b. If Min(h V i,e), then Min(h,e). 

c. Let e.i be non-L-false. If Min(h,e), then Min(h,e . i). [Corresponds 
to T61-31.] 
(From D84-2.) 


T85-6. Let e.i not be L-false. If either Mar(k,e) or Min(k,e), then 
€q(h,e,h,e . i). [Corresponds to T65-7 and 8.] (From T4c, Tra; T5c, Trb.) 


The following theorems are chiefly of interest when applied to a situa- 
tion like that described in § 61: h is a hypothesis which may be a deter- 
ministic or a statistical law; 7 is a sentence reporting (or predicting) an 
observation; e is the prior evidence, that is, the knowledge available be- 
fore the observation i is made. The following assumptions will occur in 
the theorems: 


A. e.h D i; in other words, Mar(i,e.h). In § 61, we called this the 
assumption of the predictability of the observation i. 

B. e.h is not L-false; in other words, not |e D ~k; hence, not 
Min(h,e). 


as See 
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C. e „tis not L-false; in other words, not} e D. ~i; hence, not Min(z,e). 
(B) and (C) serve merely to exclude trivial cases. : 


The following theorem exhibits a connection between these assump- 
tions. : X 


T85-8. Lemma. If (A) and (B) hold, then (C) holds. + 


Proof, indirect. Suppose that (C) does not hold. Then} e D ~i and} e.h D 
~ i. Hence, if (A) holds, e « k is L-false, in contradiction to (B). 


T85-9. Let assumption (A) hold. If Min(%,e . i), then Min(h,e). 


Proof. The condition implies that (1) e «4 is not L-false (D84-2a), hence e is 
not L-false; (2) |e. D ~k (D84-2b), hence Fesh D ~i, hence, because 
of (A), eh is L-false, hence,} e D ~h. Therefore, Min(h,e) (D84-2). 


To says in effect that, if the hypothesis is impossible after the predict- 
able observation, then it was so before. 


85-10. Let the conditions (A) and (B) be fulfilled; hence (C) holds 
too (T8). 
a. MC(h,e » i,h,e). (From T83-15c.) (Here (A) and (C) would suffice.) 
[This theorem corresponds to T61-3e.] 
b. Gq(h,e . i,h,e) if and only if Mar(i,e). [Corresponds to T6r-3f and g.] 


Proof. 1. Suppose that Ga(h,e «i,k,e). Then (183-16) either (a) Fe Dh, 
hence because of (A) fe D i; or (c) Fe D e.i, hence fe Di. The case (b) in 
83-16 is excluded by (B). Thus in any case fe Di, hence Mar(i,e). 2. Suppose 
that Mar(i,e). Then }e Di, hence ļe= e.i (Tar-si(t)), hence Œq holds 
(T83-16c). 


c. Gr(h,e « i,h,e) if and only if not Mar(i,e). [Corresponds to T61-3h.] 


Proof. 1. Let the condition with Or be fulfilled. Suppose that Mar(i,e), 
hence fe D i, hence þe De «i. Thus in T83-17 both (d:) and (d:) would be 
violated, hence also (d), in contradiction to the assumption concerning Gr. 
Therefore the supposition made cannot hold. 2. Let Mar(i,e) not hold. Then not 
te Di, hence not fe D e.i. Suppose now that MC(h,e,h,e « i) were to hold, 
Then, according to T83-15, one of the following three cases would hold: either 
(a) fe D h, hence because of (A) Fe Di, in contradiction to the above result; 
or (b) }e«i D ~h, hence | e.h D m~i (T21-5h), hence because of (A) esh 
would be L-false, in contradiction to (B); or () Fe D AV (i), hence teD 
(e.k) Vi, hence because of (A) fe Di, in contradiction to the initial as- 
sumption. Therefore the supposition concerning ME cannot hold. From this 
and Troa the condition with Gr follows (D82-2). 


Troc says in effect that the confirmation of the hypothesis is increased 
by the predictable observation å, in other words, 7 is positively relevant 
to h on e, if and only if ¢ was neither entailed nor excluded by the prior 


evidence e. 
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The following theorem makes a comparison between two observations — 
i, and i., both predictable by the hypothesis 4. 


T85-11. Let (A) and (B), and hence (C) too, hold for both i, and i+. 
Let not Min(h,e » i2). 
a. If MC(h,e »i:,h,e « iz), then MC(i,,e,7,, e). 

Proof. Let the condition be fulfilled. Then one of the three cases (a), (b), (c) 
in T83-15 must hold. In case (a), } e «i: D h, hence |e «i; D e. h, hence, be- 
cause of (A) for ia, | € «i: D ia. Case (b) is impossible, for here } e»i2 D ~h, ` 
hence |e. D ~ia hence, because of (A) for iz, €. h would be L-false, in 
contradiction to (B). In case (c), }e «i: D AV (e. i1), hence Fesi: D (exh) 
V ia, hence, because of (A) for is, | e sî: D ia. Thus the latter holds in any case. 
Therefore MC(i,,¢,i:,e) (T83-6). 


b. The converse of (a). If MC(z,,¢,i,,e), then MC(h,e « i,,/,€ » iz). 


Proof. Let the condition be fulfilled. Then }e.«i; Di. (T83-6b), hence 
Fesi: D hV (e sia); this is the second part of T83-15c for the assertion. The 
first part, viz., } e sia. h D esi: follows from (A) for iz. Hence (T83-15c) the 
assertion. 

c. If Gr(h,e . inh,e «7;), then Gr(i,,e,i,,e). [Corresponds to T61-5¢.] 

Proof. The condition entails (D82-2) that MC(h,e sinhe sia) and not 
MC(h,e »i2,h,e ni1). Therefore MC(i,,e,i,,e) (from (a)) and not MC(is,e,i.,¢) 
(from (b)). Hence Gr(i,,¢,i:,¢) (D82-2). 


d. The converse of (c). If Gr(z,,¢,t,,e), then Gr(h,e « 7:,4,¢ « i). (From 
(b) and (a).) [Corresponds to T61-s5d.] 
e. If Eq(h,e «i,k, . iz), then €q(i.,e,i,,e). (From (a) and (b).) [Corre- 
sponds to T61-5e.] 
f. The converse of (e). If Gq(i.,e,i,,e), then Eq(/,e «i:,/,e « ia). (From 
(b) and (a).) [Corresponds to Té6r1-sf.] 


Trrc and d say in effect this. The posterior confirmation of the hy- 
pothesis / after the observation 7, is higher than that after the observa- 
tion 7, if and only if the expectedness of ż,, that is, its confirmation on 
the prior evidence e, is smaller than that of z,. In other words, the more _ 
improbable the occurrence of a predictable event, the more does its ob- 
servation increase the confirmation of the hypothesis. The corresponding 
result concerning c-functions has been discussed earlier (see remark (iii) 
on T6o-1c). 

_ The following theorem makes a comparison between two hypotheses 
h, and hz, by each of which the observation 7 is predictable. 


785-12. Let (A) and (B), and hence (C) too, hold for both #, and ha- 
a. If MC(hz,e «i,h.,e ai), then MC(h,,e,h2,€). 
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Proof. Let the condition be fulfilled. Then e.i is not L-false (T83-6a), 
hence e is not. Further (T83-6b) | e si s k2 D hy. (A) for h, says that Feshi Di, 
hence | e a ha D i a € «hs, hence with the former result } e.h2 D hı. This yields 
the assertion (T83-6). 7 

b. The converse of (a). If MC(h,,e,h2,e), then MC(hz,¢ « i,h2,€ = i). 

Proof. The condition implies (T83-6b) that | e «ka D Mx, hence fe «i «ha D 
hı. Thus T83-6b for the assertion is fulfilled; likewise T83-6a because of (C). 

c. If Gr(hy,e « i,/2,¢ « i), then Gr(hz,e,42,¢). (From (a) and (b); the proof 
is analogous to that of Trrc.) 

d. The converse of (c). If Gr(/,e,/2,¢), then Gr(h,,¢ »7,/2,¢ . i). (From 
(b) and (a).) [(c) and (d) correspond to T61-6c.] 

e. If Galle .i,hae « i), then Eq(/t,,e,2,¢). (From (a) and (b).) [Corre- 
sponds to T61-6d.] i 

f. The converse of (e). If Eq(ke,kae), then Eq(hne « i,hae «7). (From 
(b) and (a).) [Corresponds to T61-6e.] 


Trof and e say in effect this. If the two hypotheses have equal prior con- 
firmation (i.e., on evidence e alone), then they have also equal posterior 
confirmation (i.e., on evidence e . 4), and vice versa. Tr2d and c say this. 
If the prior confirmation of one hypothesis is higher than that of the other 
hypothesis, then its posterior confirmation is likewise higher than that of 
the other, and vice versa. The corresponding results for c-functions have 


been discussed earlier (in § 61). 


This concludes our outline of a system of comparative inductive logic. 
The system deals with comparative concepts of confirmation, ME and 
other concepts. These concepts are purely comparative, nonquantitative, 
in the sense that they do not presuppose any confirmation concepts with 
numerical values (c-functions). Instead, they are defined on the basis of 
the simple L-concepts (L-implication, etc.). Since the latter concepts con- 
stitute the basis of deductive logic, comparative inductive logic may be 
regarded as a simple extension of deductive logic. Comparative inductive 
relations between sentences are not identical with the ordinary deductive 
relations between them (e.g., L-implication), but they are uniquely de- 
termined by the latter. This may be illustrated by the following simple 
example. A hypothesis / is more strongly confirmed (in the comparative 
sense, as expressed by ‘Gr’) than h’ by the evidence e if and only if the 
following deductive relations hold: e = ’ L-implies k, and e . h does not 
L-imply #’ (T83-11). ; 

The present outline may suffice to show in general what kinds of re- 
sults are obtainable on a purely comparative basis. We shall not pursue 
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this course further, because we believe in the possibility of a quantitative 
inductive logic. For those, however, who regard a quantitative inductive 
logic either as entirely impossible or, like Kries and Keynes, as possible 
only within certain very narrow limits, the construction of a more compre- 
hensive comparative system on the basis here supplied or a similar one 
might be an important task. 


§ 86. The Concept of Confirming Evidence 


Of the three semantical concepts of confirmation (§ 8), we have so far dis- 
cussed the quantitative (degree of confirmation c) and the comparative (ME). 
In the last three sections of this chapter we shall discuss the problem of expli- 
cating the classificatory concept of confirmation: ‘i is confirming evidence for the 
hypothesis 4 on the basis of e’, in symbols: ‘C(4,i,e)’. This means in terms of ¢ 
that the c of k is increased by adding the new evidence i to the prior evidence e; 
in other words, i is positively relevant to h on e (§ 65). We define a relation @’ 
(8); the essential condition for G’(h,i,e) is that either þe. h Di orfe.i Dh. 
It is found that ©’ holds if and only if the condition in terms of ¢ mentioned 
above is fulfilled for every regular ¢ (11). However, it is found that G’ is too 
narrow as an explicatum. A definition for a relation Œ* is indicated (5) such 
that the above condition is fulfilled for our ¢-function c* to be introduced later; 
this is a definition in quantitative terms. We leave the question open whether 
an adequate explicatum can be found that is defined in nonquantitative terms 
(like €’ and MG). 


We have earlier (§ 8) distinguished three semantical concepts of con- 
firmation: (i) the classificatory concept of confirmation, the concept of 
confirming evidence, (ii) the comparative concept (‘more or equally con- 
firmed’), and (iii) the quantitative concept, the concept of degree of confir- 
mation. The first of these three concepts is the simplest; the second is 
more complicated but also more efficient; the third is still more efficient, 
provided an adequate explicatum of this kind can be found. Our discus- 
sions do not take up the problems of these three concepts in the order just 
mentioned, which is the order of increasing complexity, but rather in the 
opposite order. We have first dealt with the regular c-functions (in the two 
preceding chapters) ; they—or rather, some of them—come into considera- 
tion as explicata for the quantitative concept of confirmation. Only after- 
ward, in the preceding sections of this chapter, did we study the problem 
of the comparative concept and propose the relation MG as an explicatum 
for it. The discussion of this problem was postponed for the following 
reason. Although the definition itself of the concept MG is in purely com- 
parative, nonquantitative terms, that is, it does not refer to the c-func- 
tions, nevertheless the conditions of adequacy for the comparative con- 
cept do refer to the c-functions. Therefore it seemed advisable, from the 
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heuristic point of view, to take up the study of the problem of the com- 
parative concept only after a theory of the regular c-functions had been 
constructed. For the same reason the discussion of the simplest concept, 
the classificatory concept, was postponed. 

We distinguish two forms of the classificatory concept of confirming 
evidence. The general form is relative to some evidence e. “ confirms / on 
the basis of e’ is understood in the following sense: 7 is an additional item 
of evidence which, if added to the prior evidence e, contributes positively 
to the confirmation of the hypothesis 4. In particular, the concept is ap- 
plicable to the following situation: e represents the prior evidence (in the 
sense of § 60, that is, the evidence available before the results 7 are found); 
i describes new observational results, for instance, results of experiments 
made in order to test the hypothesis 4. We shall use the symbol ‘@’ for any 
explicatum of this classificatory concept that might be considered. Thus 
an explicatum of the above statement will be symbolized by ‘C(h,t,e)’ 
(h is confirmed by i on the basis e’). The second form of the concept 
means simply that i is confirming evidence for 4 so to speak absolutely; 
that is, without reference to any prior factual evidence e. More exactly 
speaking, it refers to that special case of the first concept, where no prior 
factual evidence is available, in other words, where e is the tautology FH 
We might say in this case that 7 is initially (or a priori) confirming evidence 
for k. For an explicatum of this second concept we shall use the symbol 
‘Gv’, in analogy to ‘co’ (D57-1), which likewise refers to the evidence v. 
Thus, if a function € is given, we define: 

(1) Co(h,i) = nt C(H,3,2) « 
(‘© and ‘G,’ are not symbols of the object-languages £, but predicates 
in the metalanguage like ‘INC’, ‘Gr’, etc.) 

We began the study of the problem of an explication for the compara- 
tive concept by investigating the relation which must hold between any 
adequate explicatum ME and the regular c-functions (§ 80). We shall 
now do the same for the classificatory concept. Suppose we had a regular 
c-function ¢ which we regarded as an adequate explicatum of the quantita- 
tive concept. How could we express with its help the classificatory con- 
cept of confirming evidence? This concept means that the degree of con- 
firmation of h is increased by the addition of 7 to e; hence it is expressible 
in terms of c as follows: 

(2) c(h,e»4) > clhe) . 
(This condition implies that e and e - i are non-L-false, because otherwise 
c would not have values for the sentences in question.) Therefore we shall 
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` say that a triadic relation R among sentences, considered as an explicatum 
for the classificatory concept, is im accord with a given c-function ¢ under 
the following condition: 
(3) R is in accord with ¢ = ns for any sentences h, i, and e, if R(h,i,e), then 
i c(h,e «i) > c(h,e) , 


in other words (D65-1a), 7 is positively relevant to h on e with respect toc. 

For G,, ‘?’ takes the place of e. Thus here the condition (2) is replaced by: 
(4) c(k,i) > c(h,t) , 
in other words (D65-2a), ż is initially positive to h. 

We shall later introduce a particular c-function ¢* as our explicatum 
for the quantitative concept (§ 110A). On the basis of this function we 
might then introduce a concept of confirming evidence €* and a concept 
of initially confirming evidence G¥ by explicit definitions in terms of ¢* as 
follows: 

(5) G*(h,i,e) = ps c*(h,e si) > c*(h,e) ; 

(6) GF(h,2) = pi c* (h,i) > c¥(h,t) . 

An analogous procedure is possible on the basis of any other function ¢ 
chosen as explicatum. However, definitions of this kind would not yield 
purely classificatory concepts but only classificatory concepts quantita- 
tively defined. The concepts €* and C may be useful; they represent 
indeed positive relevance and initial positive relevance, respectively, ` 


with respect to c*. If, however, we are looking for purely classificatory À y 


concepts, then these definitions do not supply a solution. 

A purely classificatory concept © would be a concept adequate as an 
explicatum for the classificatory explicandum and defined without the 
use of any quantitative concepts like c-functions; it might instead be de- 
fined, like ME (D8r-r), in terms of L-concepts. We shall not give a solu- 
tion of this problem but only indicate, in these last three sections of this 
chapter, some considerations relevant for the problem and discuss a few 
concepts which might be considered as explicata. 

Instead of basing the classificatory concept € on one particular c-func- 
tion, one might think of requiring that it be in accord with all regular 
c-functions. If, moreover, € is required to be the most comprehensive 
relation fulfilling the first requirement, then the following would be a 
necessary and sufficient condition for Ç: 

(7) for every regular c, c(h,e.i) > c(h,e) . 
This procedure would be analogous to the earlier one concerning MG. 
There we laid down a necessary and sufficient condition referring to all 
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c-functions (§ 80, (4)). Then we defined the relation MG in terms of L- 
concepts (D81-r) and showed that it fulfilled the requirement (for fw, 
T81-1). We can now proceed here similarly. We shall define the relation 
©’ in terms of L-concepts, hence in a nonquantitative way; then we shall 
show that it satisfies the quantitative requirement stated above. The con- 
cept G’ is here merely offered for discussion but not proposed as an expli- 
catum. We shall later indicate some reasons which make its adequacy ap- 
pear doubtful. The definition of ©’ is as follows: 


(8) G’(h,i,e) = ns the following three conditions are fulfilled : 
a. e.t. h is not L-false; Í 
b. e. ~i. ~h is not L-false; 
c. Either pe. k D ior}e.z D kor both. 


We shall now develop several sufficient and necessary conditions for G’. 
This will lead to the result that G’ satisfies the quantitative requirement. 
We restrict this consideration to finite systems; all the results hold likewise 
for nongeneral sentences in the infinite. system (compare T8r-1 and 2). 

For sentences in a finite system, G’ can be expressed in terms of rele- 

vance concepts as follows. 
(9) For any triple of sentences h,7,¢ in fw, G’(h,i,e) if and only if 7 is either 
extremely positive to k on e (D74-1a) with respect to every regular c-func- 
tion or completely positive to k on e (D75-1a) with respect to every regular 
c-function. 


Proof. 1. Suppose that 6’(h,i,e). Then k: and k, are L-false but kı and k, are 
not (from (8); for kı, etc., and m;, etc., see the explanations preceding T65-1). 
Therefore, for every regular m, ma and m; are o but m, and m, are not. There- 
fore, 7 is either extremely positive (T74-1c) with respect to every c (174-74) or 
completely positive (T75-1c) with respect to every ¢ (T75-7a). 2. Let i be ex- 
tremely positive to kone. Then m: and m, are > o (T74-1¢), hence kı and ky 
are not L-false; and þe «i D 4 (T74-1d). Hence 6’(h,i,e). 3. Let i be completely 
positive to # one. Then k: and ką are not L-false (T75-1c);and } e.h Di (T75- 


1d). Hence G’(h,i,e). 


(10) For any triple of sentences h,i,¢ in €w, ©'(%,ż,e) if and only if ż is posi- 
tive to h on e with respect to every regular c-function. (From (9), T76-4a.) 
(11) For any triple of sentences /,i,¢ in 2y, C'(h,i,e) if and only if, for 
every regular c-function c, c(h,e = 4) > c(h,e). (From (10), D65-1a.) 

(11) says that the condition (7) is sufficient and necessary for e . Thus 
G’, as defined by (8), satisfies our quantitative requirement: C’ is in ac- 
cord with every regular c-function and it is the most comprehensive rela- 
tion fulfilling this condition. [We see from (11) that the sentence ‘C’(/,i,¢)’ 
means something similar to ‘Gr(/,¢ « i,h,e)’. However, there is the follow- 
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ing difference. Gr(h,e .7,h,e) if and only if ‘2’ holds between c(/,e . i) and 
c(#,e) for every c and ‘>’ holds for at least one c (T82-1b). On the other 
hand, G’(h,i,e) if and only if ‘>’ (and hence ‘2’) holds for every c. There- 
fore, if ©'(h,i,e), then Gr(h,e..i,h,e); but the converse does not always 
hold.] j 

Now we have to study the question of adequacy. Suppose that a rela- 
tion R among sentences is considered as a possible explicatum for the 
concept of confirming evidence as explicandum. Since the explicandum is 
vague, there will be no general agreement for all cases whether it holds or 
not; but there are certain cases for which there is practical agreement that 
the explicandum holds, other cases for which there is practical agreement 
that it does not hold, while in still other cases there is no agreement. Now 
we might say that R is clearly too wide if we find cases in which R holds but 
the explicandum clearly (i.e., with practically general agreement) does 
not hold. And we might say that R is clearly too narrow if we find cases in 
which R does not hold but the explicandum clearly holds. It is possible 
that R is clearly too wide in one direction and clearly too narrow in 
another. 

Let us examine the question whether @’ may be regarded as an ade- 
quate explicatum for the classificatory concept of confirmation. ©’ is cer- 
tainly not too wide. Whenever G’(h,i,e) holds, everybody will agree that ¢ 
is confirming evidence for 4 on e because, no matter which particular c- 
function he has chosen, he will find that the c of # is increased by the addi- 
tion of ¢ to e; this is seen from (11). The question remains whether ©’ is 
not too narrow. The definition of ©’ requires that either }¢.i D k or 
Fe.h D i. In the first case k follows from e together with i; this means 
that, after the additional observation i, the hypothesis / is certain. In 
the second case, 7 follows from e together with k; i is a predictable observa- 
tion in our earlier sense (§ 61). These two cases together are far from 
covering all instances in which the explicandum clearly holds, that is to 


say, all those instances in which, according to customary inductive think- 


ing, i would be regarded as confirming evidence for k with respect to e. 
This is shown by the following examples. Thus ©’ is clearly too narrow. 


Counterexamples to C'. 

1. Let h be a simple law of conditional form (§ 38), say ‘(x)(Mx D M's)’ 
(‘all swans are white’) for a finite domain of individuals (NV > 1). Let e be the 
tautology ‘t’. Let i be ‘Mb . M’b’ (‘b is a swan and b is white’). i would generally 
be regarded as a confirming instance for # even without any prior evidence. 
However, neither } 4 D i nor +i D h; hence ©’ does not hold. 

2. Let ‘P’ be a primitive predicate. Let e be ‘Pa:.~Pa,’. Let i be 
‘Pa; Pays...» Pa;,’, a conjunction of ten full sentences of ‘P’. Let h be the 


§ 86. THE CONCEPT OF CONFIRMING EVIDENCE 467 


singular prediction ‘Pay. According to customary inductive thinking, h is re- 
garded as more probable after the observations reported in i than before; in 
other words, i is regarded as confirming evidence for h on e. However, neither 
Fesh Dinor}eut D h; hence © does not hold. ' 


The definition of G’ is constructed in such a way that Ç’ satisfies the 
requirement that it be in accord with all regular c-functions. We have 
found that G’ is too narrow. The reason is that the requirement mentioned 
is too strong. The definition can be made wider and thereby more ade- 
quate if we require only that Œ be in accord with some regular c-functions. 
It is not difficult to find more restricted classes of c-functions for which it 
seems still plausible that they contain all adequate quantitative explicata 
for probability,. We might, for instance, take the class of all those c-func- 
tions which fulfil the condition of symmetry which will be discussed in 
the next chapter (Do1-1); we might also add further conditions which 
seem plausible (for instance, symmetry with respect to basic matrices 
(§ 28), and other conditions to be discussed in later chapters). However, 
if we choose any such class of c-functions and then require that Œ be the 
most comprehensive relation which is in accord with just these c-functions, 
then it appears rather doubtful whether it is possible to construct a defini- 
tion for this G of not too complicated structure in terms of L-concepts, like- 
the definitions of MG and Ç’. j 

Incidentally, it would be of interest to investigate the possibility of ap- 
plying the method just outlined also to the problem of explicating the 
comparative concept; that is to say, the possibility of an explicatum which 
is wider than MG because based on a narrower class of c-functions. How- 
ever, in this case likewise it seems doubtful whether a simple definition in 
L-terms can be found. 

We shall not try here to find an adequate explicatum defined in non- 
quantitative terms for the classificatory concept either by the method just 
indicated or by any other method. For our system of inductive logic the 
theory of confirming evidence is represented first by the general theory of 
relevance for regular ¢-functions as developed in the preceding chapter, 
and second by the theory of G* or, in other words, of positive relevance 
with respect to c* to be developed later. It is true that these theories are 
quantitative, but from our point of view this fact is not necessarily a dis- 
advantage. The task of finding an adequate explicatum for the classifica- 
tory concept of confirmation defined in purely classificatory, that is, non- 
quantitative terms is certainly an interesting problem; but it is chiefly of 
importance for those who do not believe that an adequate explicatum for 
the quantitative concept of confirmation can be found. í 
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§ 87. Hempel’s Analysis of the Concept of Confirming Evidence 


Some interesting investigations by Hempel concerning the concept of con- 
firming evidence are here discussed. Hempel shows correctly that two wide- 
spread conceptions are too narrow: Nicod’s criterion (the law ‘all swans are 
white’ is confirmed by observations of white swans and only by these) and the 
prediction-criterion (a hypothesis is confirmed by given evidence if and only if 
one part of this evidence can be deduced from the other part with the help of 
the hypothesis). Hempel lays down some general conditions which, in his view, 
a concept must fulfil in order to be an adequate explicatum of the concept of 
confirming evidence. It is shown that some of these conditions are not valid, 
that is to say, no adequate explicatum can fulfil them. 


In this and the next sections we shall discuss investigations made by 
Carl G. Hempel concerning confirmation in general and especially the 
classificatory concept. The following discussion is chiefly based on an 
article of his published in two parts in Mind 1945 ([Studies]; references in 
the following are to this article); some of his technical results had been 
published previously ([Syntactical], 1943). The first-mentioned article 
gives a clear and illuminating exposition of the whole problem situation 
concerning confirmation and the distinction between the classificatory, 
the comparative, and the quantitative concepts of confirmation. A num- 
ber of points in this problem complex are here clarified for the first time. 
For instance, Hempel’s distinction between the pragmatical concept of the 
confirmation of a hypothesis by an observer and the logical (semantical) 
concept of the confirmation of a hypothesis on the basis of an evidence 
sentence is important; likewise his distinction of the three phases in the 
procedure of testing a given hypothesis (of. cit., p. 114): making observa- 
tions, confronting the hypothesis with the observation report, accepting 
or rejecting the hypothesis. These distinctions are valuable tools for clarify- 
ing the situation for many discussions and controversies at the present 
time concerning confirmation, the foundations of empiricism, verifiability, 
and related problems. 

The main part of Hempel’s article concerns the problem of an explica- 
tion for the classificatory concept of confirmation. We shall now discuss 
his views in detail. His explicandum is as follows: a sentence (or a class of 
sentences, or perhaps an individual) represents confirming (corroborating, 
favorable) evidence or constitutes a confirming instance før a given hy- 
pothesis. In his general discussion and in the examples, no reference is 
made to any prior evidence. Thus Hempel’s explicandum corresponds to 
our dyadic relation ©,(/,2) (‘h is confirmed by 7’) rather than to the triadic 
relation €(h,i,e) (‘h is confirmed by i on the basis of the prior evidence e’). 
Therefore we shall in the following compare the explicata discussed by 
Hempel with G. 
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Hempel starts with a critical discussion of an explicatum which seems 
widely accepted (op. cit., pp. 9 ff.); he quotes the following passages by 
Jean Nicod as a clear formulation for it: “Consider the formula or the law: 
A entails B. How can a particular proposition, or more briefly, a fact, affect 
its probability? If this fact consists of the presence of B in a case of A, it is 
favourable to the law ‘A entails B’; on the contrary, if it consists of the 
absence of Bin a case of A, it is unfavourable to this law. It is conceivable 
that we have here the only two direct modes in which a fact can influence 
the probability of a law. ... Thus, the entire influence of particular 
truths or facts on the probability of universal propositions or laws would 
operate by means of these two elementary relations which we shall call 
confirmation and invalidation” ([Induction], p. 219). Hempel refers here 
also to R. M. Eaton’s discussion on “Confirmation and Infirmation” 
({Logic], chap. iii), which is based on Nicod’s conception. Thus, according 
to Nicod’s criterion, the fact that the individual b is both M and M’, or 
the sentence ‘Mb » M’b’ describing this fact, is confirming evidence for the 
law ‘(x)(Ma D M’x)’. Hempel discusses this criterion in detail, and I 
agree entirely with his views. As he points out, the criterion is applicable 
only to a quite special, though important, form of hypothesis. But even if 
restricted to this form, the criterion does not constitute a necessary condi- 
tion; in other words, it is clearly too narrow (in the sense of § 86). Hempel 
shows that it is not in accord with the Equivalence Condition for Hy- 
potheses (see below, H8.22). For instance, ‘Mb. M'b’ is confirming evi- 
dence, according to Nicod’s criterion, for the law stated above, but not 
for the L-equivalent law ‘(«)(~M’x 9 ~Mzx)’. This is an instance of 
what Hempel calls the paradox of confirmation. He discusses this paradox 
in detail and reveals its main sources (of. cit., pp. 1 3-21). [We have briefly 
indicated this paradox earlier (§ 46) and we shall discuss it later (in Vol. 
II) in connection with the universal inductive inference; we shall try to 
throw some light on the problem from the point of view of our inductive 
logic; our results will essentially be in agreement with Hempel’s views.] 
Nicod’s criterion may be taken as a sufficient condition for the concept of . 
confirming evidence if it is restricted to laws of the form mentioned with 
only one variable. That in the case of laws with several variables it is not 
even sufficient is shown by Hempel with the help of the following counter- 
example (op. cit., p. 13 n), which is interesting and quite surprising. Let 
the hypothesis be the law ‘(«)(y)[~(Ray » Rysz) D (Rzy . ~ Ryz)]’. [In- 
cidentally, by an unfortunate misprint in the footnote mentioned, the 
ponent in the antecedent was omitted.] Now the 


second conjunctive com] 
fact described by ‘Rab . ~Rba’ fulfils both the antecedent and the conse- 
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quent in the law; hence this fact should be taken as a confirming case ac- 


cording to Nicod’s criterion. However, since the law stated is L-equivalent . 


to ‘(x)(y)Rxy’, the fact mentioned is actually disconfirming. 

Hempel proposes (p. 22) to take the concept of confirming evidence not, 
like Nicod, as a relation between an object or fact and a sentence, but as a 
semantical relation—or, alternatively, a syntactical (i.e., purely formal) 
relation—between two sentences, as we do with G, (and c). A language 
system L is presupposed. The primitive predicates in Z designate directly 
observable properties or relations. An observation sentence is a basic sen- 
tence (atomic sentence or negation, D16-6b) in L. An observation report 
in the narrower sense is a class or conjunction of a finite number of ob- 
servation sentences (of. cit., p. 23); an observation report in the wider 
sense is any nongeneral sentence. We shall henceforth use the term in the 
wider sense. [Hempel uses the wider sense in the more technical paper 
[Syntactical], p. 126. In the text of [Studies] he uses the narrower sense, 
but he mentions the wider sense in footnotes (pp. 108, 111) and declares 
that the narrower sense was used in the text only for greater convenience 
of exposition and that all results, definitions, and theorems remain appli- 
cable if the wider sense is adopted. Thus our use of the wider sense is 
justified; it will facilitate the construction of some examples.] Hempel 
admits also contradictory sentences as observation reports (p. 103, foot- 
note 1); however, we shall exclude them, in accord with our general re- 
quirement that the evidence referred to by any confirmation concept be 
non-L-false. (This requirement was later accepted by Hempel [Degree], 
p. 102; our exclusion here will not affect the results of the subsequent dis- 
cussion of Hempel’s views.) Hempel restricts the evidence e referred to by 


the concept of confirmation to observation reports, but the hypothesis h. 


may be any sentence of the language L. The structure of L is similar to 
that of our systems £ except that Z does not contain a sign of identity. 
Hempel makes (op. cit., pp. 97 ff.) a critical examination of another ex- 
plicatum of the concept of confirming evidence, which is often used at 
least implicitly and which at first glance appears as quite plausible. This 
explicatum, which Hempel calls the prediction-criterion of confirmation, 
is based on the consideration that it is customary to regard a hypothesis 
as confirmed if a prediction made with its help is borne out by the facts. 
This consideration suggests the following definition: An observation re- 
port &; confirms the hypothesis 4 = pt &; can be divided into two mutual- 
ly exclusive subclasses Rs: and Ri such that Ra is not empty, and every 
sentence of &;, can be logically deduced from (i.e., is L-implied by) i 
together with 4 but not from Ri: alone. Hempel shows that this concept 


a 


§ 87. HEMPEL ON CONFIRMING EVIDENCE 471 


is indeed a sufficient condition for the explicatum sought, but not a neces- 
sary condition; in other words, it is not too wide, but it is clearly too nar- 
row. The chief reason is the obvious fact that most scientific hypotheses 
do not simply express a conditional connection between observable prop- 
erties but have a more general and often more complex form. This is il- 
lustrated by the simple example of the sentence ‘(x)[(y)R.xy D (Az)R,x2)’ 
in an infinite universe, where R, and R, are observable relations. If we take 
any instance of this universal sentence, say with ‘b’ for ‘x’, then we see 
that the antecedent (i.e., ‘(y)R,by’) is not L-implied by any finite class 
of observation sentences, and that the consequent (i.e., ‘(Hz)R.bz’) does 
not L-imply any observation sentence. This shows that it is “a consider- 
able over-simplification to say that scientific hypotheses and theories en- 
able us to derive predictions of future experiences from descriptions of 
past ones” (p. 100). The logical connection which a scientific hypothesis 
establishes between observation reports is in general not merely of a de- 
ductive kind; it is rather a combination of deductive and nondeductive 
steps. The latter are inductive in one wide sense of this word; Hempel 
calls them ‘quasi-inductive’, 

After these discussions of Nicod’s criterion and the prediction-criterion 
resulting in the rejection of both explicata as too narrow, Hempel proceeds 
to the positive part of his discussion. He states a number of general condi- 
tions for the adequacy of any explicatum for the concept of confirming 
evidence (pp. 102 ff.); we shall discuss them in the present section. Then 
he defines his own explicatum and shows that it fulfils the conditions of 
adequacy; this will be discussed in the next section. Hempel’s conditions of 
adequacy are as follows (‘H’ is here attached to his numbers) ; the evidence 
e is always an observation report as explained earlier, while the hypothesis 
h may be any sentence of the language L. 


(H8.1) Entailment Condition: If h is entailed by e (i.e.,} e D h), then e 


confirms %. F 
(H8.2) Consequence Condition: If e confirms every sentence of the class 
R; and k is a consequence of (i.e., L-implied by) §;, then e con- 


firms h. 
The following two more special conditions follow from H8.2. 


(H8.21) Special Consequence Condition: If e confirms k, then it also con- 
firms every consequence of k (i.e., sentence L-implied by 4). 

(H8.22) Equivalence Condition for Hypotheses: If h and h’ are L-equiva- 
lent and e confirms / then e confirms h’. 
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(H8.3) Consistency Condition: The class whose elements are e and all 
the hypotheses confirmed by e is consistent (i.e., not L-false). 


The following two more special conditions follow from H8.3. 


(H8.31) If e and k are incompatible (i.e., L-exclusive, e . is L-false), 
then e does not confirm k. 

(H8.32) If k and 4’ are incompatible (i.e., L-exclusive), then e does not 
confirm both 4 and h’. 


(H8.4) Equivalence Condition for Observation. Reports (op. cit, 
p. 110 n.): If e and e’ are L-equivalent and e confirms 4, then e’ 
confirms h. 


Now we shall examine these conditions of adequacy stated by Hempel. 
We interpret these conditions as referring to the concept of initial con- 
firming evidence as explicandum; we shall soon come back to the question 
whether Hempel has not sometimes a different explicandum in mind. Thus 
we shall apply the conditions to ©,; but when we accept one of them, we 
shall state not only a condition (b) for Ce, but first a more general condi- 
tion (a) for G; (b) is then a special case of (a) with ‘’ for ʻe’. It is presup- 
posed for (a) that e.t is non-L-false, because otherwise c(/,e. i) would 
have no value and hence the subsequent condition (2) could not be ap- 
plied; and it is presupposed for (b) that e is not L-false. Our statements of 
conditions will have the same numbers as Hempel’s but with ‘C’ instead 
of ‘H’. For this discussion we remember that we found that Ç is the same 
as positive relevance and G, the same as initial positive relevance; there- 
fore we shall make use of the results concerning relevance concepts stated 
in the preceding chapter. Our examination will be based on the view that 
any adequate explicatum for the classificatory concept of confirmation 
must be in accord with at least one adequate explicatum for the quantita- 
tive concept of confirmation; in other words, a relation ©, proposed as 
explicatum cannot be accepted as adequate unless there is at least one 
c-function c, which is an adequate explicatum for probability, such that, 
if €.(#,z) then 
(1) c(h,i) > c(h) . 

Analogously, it is necessary for the adequacy of a proposed explicatum © 
that there is at least one adequate c such that, if C(%,ż,e), then 
(2) c(h,e i) > c(h,e) . 

In examining Hempel’s statements of conditions of adequacy or our sub- 
sequent statements, we shall regard such a statement as valid if there 
is at least one explicatum ©, (or ©) which is adequate in the sense just 


"i 
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explained, i.e., in accord with an adequate c-function, and which satisfies 
the statement generally, i.e., for any sentences as arguments. 

The entailment condition H8.1 may appear at first glance as quite plau- 
sible. And it is indeed valid in ordinary cases. However, it does not hold in 
some special cases as we shall see by the subsequent counterexamples. 
Therefore we restate it in the following qualified form. 


(C8.1) Entailment Condition. Let k be either a sentence in a finite sys- 
tem or a nongeneral sentence in the infinite system. 

a. If}e.t D hand not te D k, then C(h,i,e). 

b. If} z > kand kis not L-true, then ©.(k,i). 


The following theorem shows that the entailment condition in the 
modified form C8.1 is valid. 


T87-1. 
a. Any instance of the relation € which is required by C8.1a is in accord 
with every regular c-function. 

Proof. Let }e«i D kand not |e D A. It was presupposed that e.i is not 
L-false. Therefore, for every regular c, c(h,e »i) = 1 (T59-1b) and c(h,e) < 1 
(T59-5a). Thus this instance of € is in accord with c ((3) in § 86). 

b. Any instance of ©, required by C8.rb is in accord with every regular 

c-function. (From (a), with ’ for e.) 

In C8.1a, we have excluded the case that | e D k. This restriction is 
necessary, because in this case c(%,e) = 1 = c(h,e . tî); hence c is not in- 
creased, For the same reason, the case that / is L-true must be excluded 
in C8.rb. 

For the sake of simplicity, we have stated C8.1 only for the case that h 
is a sentence in a finite system or a nongeneral sentence in the infinite 
system. However, C8.r is valid also if % is a general sentence in the in- 
finite system except in the case where / is almost L-implied by e (Ds8-1c) 
with respect to any of the c-functions on which © is based. In the latter 
case, c(%,e) = 1 although not }e D k; thus here again c is not increased 
and hence 7 is not positively relevant. (This holds if positive relevance is 
defined by D65-1a; see, however, the subsequent remark concerning the 
alternative definition D’.) 


We considered in § 65 the following example in the infinite system: A is 
‘(Gx) Px’, i is ‘Pb’, ‘P is taken as e. We mentioned that, for certain c-functions, 
e.g., c*, h is almost L-true. Although in every finite system the c of hon “P is 
increased by the addition of i, in the infinite system c(t) is already 1 and hence 
is not increased by the addition of i. Therefore i is here irrelevant to h; i is not 


nfirmi: idence for k. : ka 
Me Gases of the kind of this example suggested the alternative definition (D’) 
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for positive relevance indicated in § 65. If this alternative definition is chosen, 
then in cases like the above example i is called positive to 4 and hence is re- 
garded as confirming evidence for k. Then the restriction of 4 to nongeneral 
sentences in the infinite system in C8.r can be omitted. (But the restricting 
conditions in (a) that not | e D k and in (b) that %4 is not L-true remain.) 

The equivalence conditions for hypotheses (H8.22) and for observation 
reports (H8.4) are obviously valid, because the corresponding principles 
hold for all regular c-functions (T59-1i and h). For Ç, the former condi- 
tion can be generalized; the hypotheses / and h’ need only be L-equivalent 
with respect to e, i.e., }e.D (k = k’) (cf. T59-2)). 

The consequence condition H8.2 and the special consequence condition 
H8.21 are not valid, as we shall see. In his discussion of H8.21, Hempel 
refers (p. 105, n. 1) to William Barrett ([Dewey], p. 312), whose view that 
“not every observation which confirms a sentence need also confirm all 
its consequences” is obviously in contradiction to the consequence condi- 
tion. Barrett supports his view by pointing to “the simplest case: the sen- 
tence ‘C’ is an abbreviation of ‘A . B’, and the observation O confirms 
‘A’, and so ‘C’, but is irrelevant to ‘B’, which is a consequence of ‘C’ ”. 
This situation can indeed occur, as we shall see; thus Barrett is right in 
rejecting the consequence condition. Now Hempel points out that Bar- 
rett, in the phrase “and so ‘C’ ” just quoted, seems to presuppose tacitly 
the converse consequence condition: if e confirms k, then it confirms also 
any sentence of which / is a consequence. Hempel shows correctly that a 
simultaneous requirement of both the consequence condition and the con- 
verse consequence condition would immediately lead to the absurd result 
that any observation report e confirms any hypothesis / (because e con- 
firms e, hence e « k, hence 4). Since he accepts the consequence condition, 
he rejects the converse consequence condition. On the other hand, Barrett, 
accepting the latter, rejects the former. Each of the two incompatible 
conditions has a certain superficial plausibility. Which of them is valid? 
The answer is, neither. 

In our investigation of the possible relevance situations for two hy- 
potheses (§§ 70, 71) we found the following results, which hold for all 
regular c-functions. It is possible that, on the same evidence e, which may 
be factual or tautological, 7 is positive to % but negative to k V k, although 
the latter is L-implied by the former. This is possible not only if å is nega- 
tive to & but also if 7 is irrelevant or even positive to & (§ 71, case 4a). 


We have indicated there a general procedure for constructing cases of this ” 


kind, and given a numerical example (§ 71, example for 4a). This shows 
that the consequence condition is not valid, that is, not in accord with any 
regular c-function. We have further found that it is possible that 7 is posi- 
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tive to % but negative to k . k, although the latter L-implies the former. 
This is possible even if 7 is positive to & (§ 71, case 3a). Here likewise a 
general construction procedure has been indicated and a numerical exam- 
ple given (§ 71, example for 3a). This shows that the converse consequence 
condition is not valid. 

A remark made by Hempel in his discussion of Barrett is interesting 
because it throws some light on the reasoning which led Hempel to the 
consequence condition. Hempel quotes Barrett’s statement that “the de- 
gree of confirmation for the consequence of a sentence cannot be less than 
that of the sentence itself”. This statement is correct; it does indeed hold 
for every regular c-function (Ts59-2d). Hempel agrees with this principle 
but regards it as incompatible with a renunciation of the special conse- 
quence condition, “since the latter may be considered simply as the corre- 
late, for the non-gradated [i.e., classificatory] relation of confirmation, of 
the former principle which is adapted to the concept of degree of confirma- 
tion”. This seems to show that here Hempel has in mind as explicandum 
the following relation: ‘the degree of confirmation of / on z is greater than 
r’, where 7 is a fixed value, perhaps o or 1/2. This interpretation seems in- 
dicated also by another remark which Hempel makes in support of the 
consequence condition: “An observation report which confirms certain 
hypotheses would invariably be qualified as confirming any consequence 
of those hypotheses. Indeed: any such consequence is but an assertion of 
all or part of the combined content of the original hypotheses and has 
therefore to be regarded as confitmed by any evidence which confirms the 
original hypotheses” (p. 103). This reasoning may appear at first glance 
quite plausible; but this is due, I think, only to the inadvertent transition 
to the explicandum mentioned above. This relation, however, is not the 
same as our original explicandum, the classificatory concept of confirma- 
tion as used, for instance, by a scientist when he says something like this: 
‘The result of the experiment just made supplies confirming evidence for 
my hypothesis’. Hempel’s general discussions give the impression that he 
too is originally thinking of this explicandum, when he refers to favorable 
and unfavorable data, both of which are regarded as relevant and dis- 
tinguished from irrelevant data, and when he speaks of given evidence as 
strengthening or weakening a given hypothesis. The difference between 
the two explicanda is easily seen as follows. Let r be a fixed value. The re- 
sult that the degree of confirmation of » after the observation ¢ is g > r 
does not by itself show that 7 furnishes a positive contribution to the con- 
firmation of k; for it may be that the prior degree of confirmation ofh (i.e., 
before the observation 4) was already g, in which case 7 is irrelevant; or it 
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may have been even greater than g, in which case zis negative. [Example. 
Let h be ‘P,b V P,b’, and i ‘P,a’. Take r = 1/2. For many c-functions 
c(k,i) = c(h,t) = 3/4. Therefore ¢ is (initially) irrelevant to k, although 
c(h,i) > 1/2.] And, the other way round, the result that the posterior 
degree of confirmation of his higher than the prior one does not necessarily 
make it higher than r (unless r = o). Thus we see that the essential cri- 
terion for the concept of confirming evidence must take into account not 
simply the posterior degree of confirmation but rather a comparison be- 
tween this and the prior one. 

The consistency condition H8.3 is not valid; it seems to me not even 
plausible. The special condition H8.31, requiring compatibility of the 
hypothesis with the evidence, is certainly valid. We restate it here in the 
general form as Compatibility Condition: 

(C8.31) Compatibility Condition. 

a. If 7 and % are L-exclusive with respect to e, that is, if e «i. k is L- 

false, then not €(,i,e). 

b. If ż and & are L-exclusive, that is, if i. # is L-false, then not ©,(4,i). 


The following theorem shows that C8.31 is valid, no matter on which 
c-function or class of ¢-functions Œ is based. 


T87-2. 
a. If a relation Œ holds in any instance excluded by C8.31a, then it is 
not in accord with any regular c-function. 
Proof. Let e «i» h be L-false. Then, for every regular c, c(h,e »i) = o (T59- 
te), hence not > ¢(h,e). 
b. If a relation Ç, holds in any instance excluded by C8.31b, then it is 
not in accord with any regular c-function. (From (a), with ‘? for e.) 


On the other hand, the second special condition H8.32 seems to me in- 
valid. Hempel himself shows that a set of physical measurements may 
confirm several quantitative hypotheses which are incompatible with each 
other (p. 106). This seems to me a clear refutation of H8.32. Hempel dis- 
cusses possibilities of weakening or omitting this requirement, but he 
decides at the end to maintain it unchanged, without saying how he in- 
tends to overcome the difficulty which he has pointed out himself. Per- 
haps he thinks that he may leave aside this difficulty because the results 
of physical measurements cannot be formulated in the simple language L 
to which his analysis applies. However, it seems to me that there are simi- 
lar but simpler counterexamples which can be formulated in our systems 
£ and in Hempel’s system L. For instance, let i describe the frequency of 
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a property M in a finite population, and k% and h’ state two distinct values 
m and m for the frequency of M in a sample of s individuals belonging to 
the population, such that the relative frequencies m/s and m’/s are both 
near to the relative frequency of M in the population as stated in 7. Then 
i confirms both % and h’, although they are incompatible with each other. 
Example. Let i be a statistical distribution (D26-6c) for M and non-M with 
respect to 10,000 individuals with the cardinal number 8,000 for M. Let h be a 
statistical distribution with respect to 100 of these individuals with the cardinal 
number 8o for M, and similarly 4’ with respect to the same individuals and with 
the cardinal number 79. Note that a statistical distribution for a finite class has 
the form of a disjunction of conjunctions and does not contain variables or the 
sign of identity; therefore it occurs also in L and it is an observation report (in 
the wider sense). Let e be either the tautology ‘?’ or a factual sentence irrelevant 
to k and to h’ (on ‘? and on i). Then for many c-functions (presumably includ- 
ing all adequate ones) c(h,e»%) > c(h,e) and c(h',e«i) > c(h’,e). (These are 
cases of the direct inductive inference, see § 94.) Thus 7 is positively relevant 

and hence constitutes confirming evidence for both 4 and h’. 

Hempel mentions in this context still another condition, which might 
be called the Conjunction Condition: if e confirms each of two hypotheses, 
then it also confirms their conjunction (p. 106). Hempel seems to accept 
this condition; he regards any violation of it as “intuitively rather awk- 
ward’’. However, this condition is not valid for our explicandum; we have 
found earlier that 7 may be positive both to k% and to & but negative to 
h.k (see § 71, case 3a and the example for it; this was mentioned above 
as a refutation of the converse consequence condition). And it is not valid 
for the second explicandum either, no matter which value we choose for r. 

This is seen as follows. Let r be any real number such that o S r < 1. Let g 
be (1 — r)/2; hence q > o. Let i say that in a given finite population the rela- 
tive frequencies are as follows: for ‘P, » P,’, r; for ‘Pı a ~P? q; for ‘Pas ~Py’, 
qi hence for ‘~P; » ~P;’, 0; for ‘Px, r + q; for ‘Px, r + q. Let h be ‘P,b’ and 
h' ‘Pb’, where b belongs to the population. Then (as we shall see later, To4-re) 
for every symmetrical ¢-function and hence for every adequate one, the follow- 
ing holds. c(h,i) = c(h’,t) = r + g> r; on the other hand, c(h h’,i) = r. 

What may be the reasons which have led Hempel to the consistency 
conditions H8.32 and H8.3? He regards it as a great advantage of any 
explicatum satisfying H8.3 “that it sets a limit, so to speak, to the strength 
of the hypotheses which can be confirmed by given evidence”, as was 
pointed out to him by Nelson Goodman. This argument does not seem to 
have any plausibility for our explicandum, because a weak additional evi- 
dence can cause an increase, though a small one, in the confirmation even 
of a very strong hypothesis. But it is plausible for the second explicandum 
mentioned earlier: the degree of confirmation exceeding a fixed value r. 
Therefore we may perhaps assume that Hempel’s acceptance of the con- 
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sistency condition is due again to an inadvertent shift to the second expli- 
candum. This assumption seems corroborated by the following result. 
Although H8.32 is not valid for our explicandum, it is valid for the second 
explicandum if we take for r 1/2 or any greater value (<r). For if and k’ 
are L-exclusive, then it is impossible that ¢(4,i) and c(h’,i) both exceed 
1/2, because the sum of those two c-values is c(i V h’,i) (according to the 
special addition theorem, T59-11), and hence cannot exceed 1. 


§ 88. Hempel’s Definition of Confirming Evidence 


Hempel defines a concept Cf as an explicatum for confirming evidence, and 
he shows that Cf fulfils his conditions of adequacy, which we discussed in the 
preceding section. It is found that Cf is too narrow as an explicatum for the 
general concept of confirming evidence, but it seems adequate as an explicatum 
for the special case where the evidence shows that all observed individuals have 
the property referred to in the hypothesis. 

This concludes the discussion of the concept of confirming evidence. 


On the basis of his analysis of the problem of an explication of the con- 
cept of confirming evidence, Hempel proceeds to construct the definition 
of a dyadic relation Cf between sentences, which he proposes as an expli- 
catum. (His construction is given in technical details in [Syntactical], pp. 
130-42, and briefly outlined in [Studies], p. 109.) We shall briefly state the 
series of definitions, using our terminology and notation and omitting 
minor details not relevant for our discussion. We add again ‘H’ to the 
numbers in the latter article and call the first definition ‘Ho.0’. e is any 
molecular sentence, 4 any sentence of Hempel’s language system L earlier 
indicated (similar to £ but without a sign of identity). 

(Ho.0) The development of h for a finite class C of individual constants 
= ps the sentence formed from h by the following transformations: (1) 
every universal matrix (i+) (M+) is replaced by the conjunction of the sub- 
stitution instances of its scope M, for all in in C; (2) every existential 
matrix (ix)(M) is replaced by the disjunction of the substitution in- 
stances of its scope M for all in in C. (If k contains no variables, then its i 
development is =% itself.) 

(Ho.1) Cfd(e,k), e directly confirms h = ps e L-implies the development 
of h for the class of those in which occur essentially in e (i.e., which occur 
in every sentence L-equivalent to e). 

(Hg.2) Cf(e,h), e confirms k =p; k is L-implied by a class of sentences 
each of which is directly confirmed by e. 


Example. Let e be ‘Pa, » Paz»... « Paro’, | “(x)Px’, and h ‘Paz’. Then 
Cfd(e,!); and, since | 1 D h, Cf(e,k); but not Cfd(e,h). 
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(Ho.3) e disconfirms h =p, e confirms non-h. 

(Hg.4) e is neutral with respect to k =p; e neither confirms nor dis- 
confirms h. 2 

Now let us see whether the concept Cf defined by Hg.2 seems adequate 
as an explicatum for our explicandum, the concept of confirming evidence. 
Hempel shows that Cf satisfies all his conditions of adequacy earlier 
stated. While he takes this fact as an indication of adequacy, it will make 
us doubtful, since we found that some of the requirements are invalid. 

It follows from our refutation of the special consequence condition 
H8.21 and the special consistency condition H8.32 that no R can possibly 
fulfil all of the following four conditions: 


(i) R is not clearly too wide (in the sense of § 86), 
(ii) R is not clearly too narrow, 
(iii) R satisfies H8.21, 
(iv) R satisfies H8.32. 


For if (ii) and (iii) are fulfilled, then our counterexamples to H8.21 lead 
to cases where R holds but the explicandum does clearly not hold; hence 
(i) is not fulfilled. And if (i) and (iv) are fulfilled, then our counterexamples 
to H8,32 lead to cases which are excluded by H8.32 but in which the ex- 
plicandum clearly holds; hence (ii) is not fulfilled. 

Since Hempel has shown that his explicatum Cf satisfies all his require- 
ments, among them H8.21 and H8.32, Cf must be either clearly too wide 
or clearly too narrow or both. I am not aware of any cases in which Cf 
holds but the explicandum does clearly not hold. Thus we may assume, 
unless and until somebody finds counterinstances, that Cf is not: clearly 
too wide. However, it is clearly too narrow; we shall see, indeed, that Cf 
is limited to some quite special kinds of cases of the explicandum. The 
result that a proposed explicatum is found too narrow constitutes a much 
less serious objection than the result that it is too wide. In the former case 
the proposed concept may still be useful; it may be an adequate explica- 
tum for a subkind of the explicandum within a limited field. It seems that 
this is the case with Cf. 

We shall now consider the four most important kinds of inductive rea- 
soning as explained earlier (§ 44B) and examine, for each of them, under 
what conditions Cf holds. In the following discussion the population is 
assumed to be finite. Individuals not referred to in the evidence e are 
called new individuals. ‘r? means relative frequency. (In (1) and (2) we 
restrict the present discussion, for the sake of simplicity, to a hypothesis 4 
concerning one individual.) 
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1. Direct inference. e is a statistical distribution (D26-6c) to the effect 
that the rf of a property in the population, say, the primitive property P, 
has the value r; / is ‘Pb’, where b belongs to the population. 

la. Let r be 1; that is, all individuals in the population are known to 
be P. Then Cf holds, but this case is trivial because e L-implies k. 

1b. Leto <r < 1. Cf does not hold. However, if r is close to 1, most 
people would regard e as confirming evidence for k. This holds even for 
both explicanda: (i) cis increased by adding e to ‘’; (ii) c(k,e) exceeds the 
fixed value g, say 1/2. (We shall see later (To4-1e) that for every sym- 
metrical c, and hence for every adequate c, c(it,e) = r.) 

2. Predictive inference. e is a statistical distribution to the effect that 
the rf of a property, say, P, in a given sample is 7; h is the singular predic- 
tion ‘Pd’, where d is a new individual. 

2a. Let r be 1; that is, all individuals in the observed sample have been 
found to be P. Then Cf holds (see the above example following Ho.2). 

2b. Let o <r < 1. Cf does not hold. However, if r is close to 1, most 
people would regard e as confirming evidence for /t, in the sense of either 
explicandum (as in 1b). (For any adequate c-function, in the case of a suffi- 
ciently large sample c(h,e) is close or equal to z.) 

Example. Let e and h be as in the earlier example following (Hg.2) and 
i ‘~Pa,,’. (i is negative to h on e.) Then not Cf(e »i,h). 

2c. Let the evidence contain, in addition to e with r = 1, irrelevant 
data on additional individuals. Then Cf does not hold. 

Example. Let e and h be as above, and i’ be ‘P.a1,’. (i is irrelevant to h on e.) 
Then not C/(e «i’,k). However, for every adequate ¢, c(h,e « i’) = c(h,e). There- 
fore, since e is regarded as confirming evidence for h, e «i’ will usually be re- 
garded so too. 

3. Inverse inference. e is a statistical distribution to the effect that the 
rf of P in a given sample of a population is 7; / is a statistical distribution 
saying that the rf of P in the population is 7’. 

3a. Let r and r’ be 1, that is, all individuals in the sample and in the 
population are stated to be P. Here Cf(e,) holds, and even Cfd(e,h) (see 
Cfd(e,!) in the example following Ho.2). 

3b. Let o <r <r. Then Cf holds for no value of r’. However, for r’ 
equal or near to r, many people, though not all, would regard e as confirm- 
ing evidence for h. 

4, Universal inductive inference. Let h be a universal sentence, say 
‘(@)Mz’, and e be a conjunction of sentences concerning the individuals 
of a given sample not containing negative instances. (If ‘b’ occurs es- 
sentially in e, it is called a positive instance for h if e L-implies ‘Mb’, a 
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negative instance if e L-implies ‘~M0’, and a neutral instance if it is 
neither a positive nor a negative instance.) j 

4a. Let e contain only positive instances. Then Cf and even Cfd hold. 
(This case is the same as 3a.) 

4b. Let e contain both positive and neutral instances. Then Cf does. 
not hold. 

Example. Let l, e, and i’ be as previously. Then not Cf(e » i',1). However, 
many will regard 7’ as irrelevant to/ on e, that is, c(l,e « i’) = c(le). Since now e 
is regarded as confirming J, that is, c(h,e) > c(h,t), c(h,e « 1’) is likewise > c(h,t). 
Hence e «i’ will be regarded as confirming h. 

Thus we see that in each of the kinds of inductive inference just dis- 
cussed Cf holds only in the special case where the evidence ascribes to all 
individuals essentially occurring in it the property in question. Although 
this case is of great importance, it is very limited. In the great majority 
of the cases in which scientists speak of confirming evidence, the rf in e 
is not 1 or o but has an intermediate value. These cases are not covered 
by Cf. However, Cf can presumably be regarded as an adequate explica- 
tum for the concept of confirming evidence in the special case described. 

Hempel’s investigations of the problem of confirming evidence supplied 
the first thoroughgoing and clear analysis of the whole problem complex. 
As such they remain valuable independently of his attempted solution of 
the particular problem of finding a nonquantitative explicatum for the 
concept of confirming evidence. The latter problem is today no longer as 
important as it was at the time Hempel made his investigations. He him- 
self has defined, in the meantime, in collaboration with others, an interest- 
ing concept de, proposed as an explicatum for degree-of confirmation (see 
Hempel and Oppenheim [Degree], and Helmer and Oppenheim [Degree]) ; 
this will be discussed in a later chapter (in Vol. II). Some years ago those 
who worked on these problems expected that, if and when a definition of 
degree of confirmation were to be constructed, it would be based on a defi- 
nition of a nonquantitative concept of confirming evidence. However, to- 
day it is seen that this is not the case either for Hempel’s definition of de 
nor for my definition of c*, and it is not regarded as probable that it will 


be the case for other definitions which will be proposed. It appears at 


present more promising to proceed in the opposite direction, that is, to 
t of confirming evidence on the 


define a quantitative form of the concep’ ‘ : z 
basis of an explicatum for degree of confirmation, for instance, ©* (or ©%) 
based on c* (see (5) and (6) in § 86) or analogous concepts based on 


Hempel’s de or on other explicata for degree of confirmation. 
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This concludes the discussion of the classificatory concept of confirm- 
ing evidence. We have not found an adequate explicatum defined in non- 
quantitative terms. The concepts which were considered as possible ex- 
plicata were found to be too narrow. However, we have a theory of con- 
firming evidence in quantitative terms. The general part of this theory, 
which refers to all regular c-functions, was constructed in the preceding 
chapter as the theory of relevance. Later we shall find specific results con- 
cerning relevance with respect to the function c*. 


CHAPTER VII 
THE SYMMETRICAL c-FUNCTIONS 


In this chapter we return to quantitative inductive logic. A special kind of 
regular c-functions is introduced, called symmetrical c-functions. The definition 
is as follows. An m-function is called symmetrical (Dgo-x1) if it has the same 
value for any state-descriptions which are isomorphic (D26-3a), i.e., such that 
one is constructed from the other by replacing individual constants with others. 
Then a c-function is called symmetrical (Dg1-1) if it is based upon a symmetri- 
cal m-function. It is shown (Tor-2) that any symmetrical c-function fulfils the 
requirement of invariance, that is to say, its value for two sentences is not 
changed if the individual constants occurring in the sentences are replaced 
with other ones. It seems generally, though tacitly, agreed that any adequate 
explicatum for probability, i.e., degree of confirmation, must fulfil this require- 
ment and hence be symmetrical. Theorems concerning symmetrical e-functions 
are developed (§§ 92-96), among them theorems concerning the direct induc- 
tive inference, that is, the inference from the frequency of a property in a popu- 
lation to its frequency in a sample (§ 94). (The other inductive inferences will 
be dealt with only in later chapters, because they presuppose the choice of a 
particular ¢-function.) The classical formulas of the binomial law (§ 95) and of 
Bernoulli’s theorem (§ 96) are here construed as approximations for special 
cases of the direct inference. 

This chapter presupposes §§ 25-27 of the earlier chapter on deductive logic. 


§ 90. Symmetrical m-Functions 


It seems plausible to require that an adequate concept of degree of confirma- 
tion should treat all individuals on a par. Those c-functions which fulfil this 
requirement will later (§ 91) be called symmetrical. As a preliminary step 
toward this concept we define here (D1) symmetrical m-functions as those 
regular m-functions which ascribe to any two isomorphic (D26-3a) state-de- 


scriptions the same value. 


In the preceding chapter we have discussed the two nonquantitative 
concepts of confirmation, viz., the comparative concept MG and the classi- 
ficatory concept ©. Now we return to the investigation of the quantitative 
concept, the concept of degree of confirmation. This investigation was 
begun in chapter v. There we introduced the general concept of regular 
functions and stated theorems which hold indiscriminately for all 
c-functions, no matter whether or not they are adequate explicata for our 
explicandum, the quantitative concept of probability, or degree of con- 
firmation. In the present chapter we strengthen the assumptions under- 
logic. Our final aim will be to choose one- 
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lying our system of inductive 
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particular c-function as our explicatum. This will be done in a later chap- 
ter (in Vol. II); there we shall define the function c* and take it as the 
basis of our system of quantitative inductive logic. In the present chapter 
we shall take only an intermediary step; we shall select a certain kind of 
regular c-functions, which we call the symmetrical c-functions. This kind, 
although considerably narrower than the general class of regular c-func- 
tions, still comprehends an infinite number of c-functions with greatly 
varying characters. It seems to me that the property of symmetry charac- 
terizing this class is very plausible and has indeed been tacitly presup- 
posed by all authors on probability,. 

In this chapter we shall make use of some more of the material ex- 
plained in chapter iii on deductive logic. As indicated at the beginning of 
that chapter, §§ 14-20 were already presupposed for chapters iv-vii, 
while §§ 21-24 list well-known theorems not for reading but for the pur- 
pose of later references. In the present chapter we shall, in addition, use 
the content of §§ 25-27. Especially the following concepts explained in 
those sections will often be used: division (D25-4), isomorphism of sen- 
tences (D26-3) and especially of state-descriptions 3 (§ 27), individual 
and statistical distributions (D26-6), structure (§ 27) and structure-descrip- 
tion (Str, D27-1). 

The following consideration will lead us to the concept of symmetrical 
c-functions. Suppose X has found by observation that the individuals a 
and b are P; the individuals may be physical objects and P may be an 
observable property. Let e be the sentence expressing these results: 
‘Pa. Pb’. X considers two hypotheses h and h’; h is the prediction that 
another object c is likewise P (‘Pc’), and h’ says the same for still another 
object d (‘Pd’). If X has chosen a concept c of degree of confirmation, he 
will ascribe a certain value to c(%,e). We cannot determine this value gen- 
erally because it depends upon the choice of c. Different functions c, even 
if each of them appears as not implausible, may yield different numerical 
values for the given case. However, we shall expect that if X ascribes a 
certain value to ¢(/,e), no matter which value this may be, he will ascribe 
the same value to c(h’,e). We should find it entirely implausible if he were 
to ascribe different values here; that is to say, we should not regard such 
a function ¢ as an adequate explicatum. The reason is that the logical 
relation between e and % is just the same as that between e and h’. Al- 
though the individuals ¢ and d may, of course, be very different in their 
empirical properties, their logical status cannot be different. The evidence 
e does not say anything about either c or d; therefore, if e is all the relevant 
evidence available to X, he has no rational reasons to expect /: more than 
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h’ or vice versa. If we consider a case where the same individual constant 
occurs both in the evidence and in the hypothesis, the result will be funda- 
mentally the same; but we have, of course, to take care that the individual 
constant is replaced in the same way in both sentences. Thus, for example, 
we shall require that .(Pc,Pc V Pa) and c(Pd,Pd V Pa) have the same 
value. To put it in very general terms, we require that logic should not 
discriminate between the individuals but treat all of them on a par; al- 
though we know that individuals are not alike, they ought to be given 
equal rights before the tribunal of logic. This is never questioned in de- 
ductive logic, although it is seldom stated explicitly. For example, since 
‘Pc’ L-implies ‘Pc V Pa’, ‘Pd’ L-implics ‘Pd V Pa’. This important char- 
acter of deductive logic is stated in general terms in the theorem of the in- 
variance of the L-concepts (T26-2). What we require here is that induc- 
tive logic should have the same character. However, this requirement is 
not fulfilled by all regular c-functions. We shall call those regular c-func- 
tions which fulfil the requirement of nondiscrimination among individuals 
symmetrical c-functions. For reasons of technical expediency, we shall not 
define this concept directly by the characteristic just indicated. Instead, 
we shall first define the concept of symmetrical m-functions by an analo- 
gous characteristic (D1). Later we shall define the symmetrical c-func- 
tions as those based on symmetrical m-functions (Dg1-r1); and then we 
shall show that they fulfil our requirement (theorem of invariance, Tg1-2). 

Before we come to the definition, we state here a simple theorem con- 
cerning the m of a structure-description. : 

T90-1. Let m be a regular m-function with respect to ty. Let Str; be 
any Str in €y, and 3; be any 3 belonging to Str; (D27-1a). Then m(Str;) 
is the sum of the m-values for all those 3 which belong to Str; in other 
words, for all those 3 which are isomorphic to 3;. (From D55-2b, T27-2f.) 

Our intention is to characterize the symmetrical m-functions as those 
which treat all individuals on a par. Now, two isomorphic 8 differ only 
in their references to different individuals (§ 27); exactly the same prop- 
erties and relations which the one attributes to a, b, c, etc., the other 


` attributes to, say, d, b, a, etc. Thus, to treat all individuals on a par 


amounts to treating isomorphic 3 on a par. This leads us to the following 
definition; it refers only to finite systems Ly; the extension to le will be 


made later (Dor-3). 


--D90-1. m is a symmetrical measure function (or briefly, a symmetri- 
cal m-function) for the 3 in 2y = nps m is a regular m-function for the 3 in 
€y (D55-1), and m has for isomorphic 3 the same value (ie., if 8; is 


isomorphic to 8; then m(3:) = m(33)). 
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The following definitions D2, D3, and D4 are analogous to the corre- 
sponding definitions concerning regular m-functions: Ds55-2, D57-3, and 
Ds57-4, respectively. 


+D90-2. Let m be a symmetrical m-function for the 3 in Qy. We ex- 
tend m to a symmetrical m-function for the sentences in &y in the follow- 
ing way. 
a. For any L-false sentence 7 in fy, m(j) = ps o. 
b. For any non-L-false sentence j in y, m(j) =p: the sum of the 
values of m for the 3 in NR; (the range of j). 


D90-3. A sequence of functions ‚m, am, m, etc., is a fitting symmetrical 
sequence of m-functions for 3 or, briefly, a fitting symmetrical m-sequence 
for 3 =p, the sequence is a fitting m-sequence for 3 (Ds7-3), and, for 
every NV, the function ym in the sequence is a symmetrical m-function 
for the 3 in ly (D1). 


D90-4. A sequence of functions m, am, ,m, etc., is a fitting symmetrical 
sequence of m-functions for sentences or, briefly, a fitting symmetrical m- 
sequence for sentences =p; the sequence is a fitting m-sequence for sen- 
tences (D57-4), and, for every N, the function ym in the sequence is a 
symmetrical m-function for the sentences in ly (D2). 


§ 91. Symmetrical c-Functions 


The symmetrical c-functions are defined (D1) as those which are based upon 
symmetrical m-functions. The value of a symmetrical m- or c-function in lo is 
defined (D3, D4) as the limit of the values in the finite systems £y, as pre- 
viously (§ 56). Then it is shown (theorem of invariance T2) that the value of 
a symmetrical m- or ¢-function remains unchanged if the individual constants 
involved are replaced by any other ones. Thus the symmetrical functions 
fulfil the requirement stated in the preceding section. It seems that all authors 
on probability, tacitly accept that requirement. 


We have earlier (D55-3) said of a function c that it is based upon a 
function m, if c(/,e) is always m(e.h)/m(e). It seems natural, in analogy 
to our earlier procedure (D55-4), to define the symmetrical c-functions as 
those based upon symmetrical m-functions (Dr). The first items in this 
section will be restricted to £y; the concepts for lo will be defined later 
(D3, Da). 


+D91-1. c is a symmetrical confirmation function or, briefly, a sym- 
metrical c-function, for Qy =p; ¢ is based upon a symmetrical m-function 
for the sentences in gy. 


From this definition the following theorem follows which is analogous 
to T55-2. 
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T91-1. Let c be a symmetrical c-function for ty. Then there is a sym- 
metrical m for thé sentences of £y (namely that upon which c is based) 
such that the following holds. 

a. For any pair of sentences k,e in fy, where m(e) = o, c(h,e) = 

me-2 (From D1, Ds5-3-) 

b. c has a value for a pair of sentences %,e in &y if and only if m(e) # o, 

hence if and only if e is not L-false in &y. (From D1, D55-3, Doo-2.) 


The following definition is analogous to D57-5. 


D91-2. A sequence of functions ,¢, 2€, ;¢, etc.) is a fitting symmetrical 
sequence of confirmation functions or, briefly, a fitting symmetrical c- 
sequence =p; there is a fitting symmetrical m-sequence for sentences 
(Doo-4) xm, am, etc., such that, for every W, ye is the symmetrical c-func- 
tion based upon ym. 

We define the values of symmetrical m- and c-functions for the infinite 
system Qo as limits of their values for finite systems (D3 and D4). This 
is analogous to our earlier procedure for regular functions (D56-r and 2). 
Here, as previously, ‘lim (. .)’ is meant as short for lite (..)’, unless other- 
wise indicated. 

+D91-3. Let „m, am, etc., be a sequence of symmetrical m-functions 
for the sentences in l, Qa, etc. m is the symmetrical m-function for the 
sentences in Qo corresponding to this sequence = ps for every sentence jin 
&o for which the limit exists, m(j) = lim ym(j); if the limit does not exist, 
m(j) has no value. 

D91-4. Let ıc, 2¢, etc., be a sequence of symmetrical c-functions for 
La, &, etc. cis the symmetrical c-function for Qo corresponding to this se- 
quence = ps for any pair of sentences h,e in o for which the limit exists, 
c(h,e) = lim ye(Myé); if the limit does not exist, c(},e) has no value. 

Since the symmetrical m-functions form a subclass of the regular m- 
functions, all theorems concerning the latter hold also for the former. 
Likewise, all theorems concerning regular c-functions hold also for the 
symmetrical c-functions. Thus we may here apply all the theorems of 
chapters v and vi; especially those of §§ 55, 57, and 59 will here be used, 
Further there are theorems which hold for the symmetrical functions but 
not for all regular functions. Some of them will here be stated. 

791-2. Invariance of symmetrical functions with respect to in-corre- 
lations. Let „m, am, etc., be a fitting symmetrical m-sequence for sentences, 
and om be the symmetrical m-function for the sentences in o corre- 
sponding to this sequence (D3); m may be either ym for any N or om. 
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Let ıc, .¢, etc., be the sequence of those c-functions which are based upon the 
given m-functions and hence are symmetrical ¢-functions; and let oc be 
the symmetrical c-function for lœ corresponding to this sequence (D4); 
c may be either we for any N or œc. Let £ be any finite or infinite system. 
Let C be an in-correlation in 2 (D26-1). Let i, k, and e be sentences in &, 
and let e not be L-false in £. Let 7’ be the C-correlate of i (D26-2b), k’ that 
of k, and e’ that of e. (Hence e’ is likewise not L-false (T26-2c).) Then the 
following holds. 
a. m(z’) = m(i). 

Proof. I. For ym. 1. Let i be L-false. Then i’ too is L-false (T'26-2c). There- 
fore both m-values are o (Dgo-2a). 2. Let i not be L-false. Then i’ too is not 
L-false (T26-2c). Therefore the ranges of these two sentences, #(i) and RG’), 
are not null. 9(i’) is the class of those 8 in £y which are constructed from the 8 
in R(i) by C (T26-2a). Thus there is a one-one correlation between the 3 in 
R(i) and those in R(i’) such that any two correlated 8 are isomorphic (D26-4b, 
D26-3a) and hence have the same m-value (Dgo-1). Hence the assertion (Dgo- 
2b). II. For om. From (I), with D3 and T4o-21e. 

+b. c(h’,e’) = c(h,e). (For we, from Tia, (a). For oc, from D4, T40-21e.) 
c. Let k and e have no in in common; likewise %4’ and e. Then c(h’,e) = 
c(h,e). 

Proof. We take a correlation C’ which, like C, correlates the in in h’ with 
those in 4, but which correlates every in in e with itself. Then the assertion fol- 
lows from (b). 


T2b is especially important. It says in effect that values of symmetrical 
c-functions are invariant with respect to a transformation of the sentences 
by any in-correlation (where, of course, for both sentences the same in-cor- 
relation must be taken). This shows that our definition of symmetrical 
c-functions does fulfil its purpose; it characterizes those c-functions which 
treat all individuals on a par. T2 is the analogue in inductive logic to the 
theorem of invariance in deductive logic (T'26-2), because the latter states 
the invariance of the L-concepts, the fundamental concepts of deductive 
logic, with respect to the transformations described. 

The principle of invariance seems to have been accepted by all authors 
on probability, both classical and modern, although it has hardly ever 
been expressed explicitly. All authors would, for instance, raise and an- 


swer questions of the following kind: Suppose that among s observed ~ 


objects there have been found s: with the property M and s+ = $ — 5: 
with non-M; what is, on this evidence, the probability that another ob- 
ject has the property M? Although nobody says so in so many words, it 
would presumably appear absurd to everybody to assume that the value 
of the probability on the evidence described depended also on the question 


a 


§ 92. THEOREMS ON SYMMETRICAL -FUNCTIONS 489 


which particular s individuals were observed and which particular other in- 
dividual was concerned in the prediction. For classical authors, this would 
appear simply as a consequence of the principle of indifference; but also 
those modern authors who reject the latter principle seem to take it for 
granted that in questions of the kind mentioned only the statement of 
the numbers but not a specification of the individuals is relevant for the 
probability. To put it in our terminology, there seems to be general agree- 
ment among authors on probability, that no concept can be regarded as 
an adequate explicatum for probability, unless it possesses the charac- 
teristic of symmetry. 

As examples for an application of T2 we may take those mentioned in 
§ oo. If c is a symmetrical c-function, then the requirements there stated 
are fulfilled; (1) ¢(Pc,Pa . Pb) = c(Pd,Pa. Pb), and (2) (PePe V Pa) = 


c(Pd,Pd V Pa). Both hold on the basis of the correlation (,; 4%) 


T91-3. Let m be a symmetrical m-function for the sentences in fy, and 
c be based upon m, hence a symmetrical c-function. Let 3; be an arbi- 
trary 3 in £y, and Gtr; be the structure-description corresponding to 3; 
(D27-1a). Let ¢; be the number of those 3 in tw which are isomorphic 
to 3; and hence belong to Str;. Then the following holds. , 
a. m(Str,) = ți X m(8,). (From Dgo-t, Tg0-1.) 
b. (3,,Str;) = 1/0; A, E 
i j =: trj e 3s = Bs (T21-51(1)) .Lheretore 
nena ee Shan i Caen) (Dss-3). Hence, with 
(a), the assertion. 


§ 92. Theorems on Symmetrical ¢-Functions 


$ E AYNA 3 ale 
ral theorems concerning symmetrical ¢ functions are s 
m eaii will later be used for the theory of inductive inferences. 


The theorems of this section vill later be used in the theory of inductive 
inferences. They hold for any finite or infinite system £, provided the con- 
ditions (A) and (B) stated at the beginning of § 59 (and a condition for 


: lled. 
m-expressions analogous to (B)) are fulfil re. De oe 
Tra is analogous to a previous theorem concerning 3B (135-4); it will 


later be used for the predictive inference. 


icates ‘Mr, ‘M,’,..., ‘My’ form a division 
aie Ese a ; PE individual distributions (D26-6a) for 
n given individual constants with respect to the given division, and let j 
be the corresponding statistical distribution (D26-6b). (Hence ¢ and 7 e 
disjunctive components in j.) Let tm (m = 1 to p) be the number of fu 


t 
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sentences of ‘Mm’ in i (and hence likewise in every other disjunctive 
component of 7). Let d be the number of disjunctive components in 
j (in other words, the number of individual distributions isomorphic 
to i). Let k and e be any sentences not containing any of the individual 
constants occurring in j, and let e be not L-false. Let m and ¢ be sym- 
metrical functions as in Tor-2. Then the following holds. (Some of the 
indications for proofs refer only to a finite system; in this case the same 
assertions for the infinite system follow with the help of T57-5 or T57-6, 
respectively.) 

a. d = sata - (From T40-32b.) 

b. m(z’) = m(z). (From Tor-2a.) 

c. m(j) = d X m(i). (From T26-sb, T57-1n, (b).) (c) is analogous to 
Tor-3, but it holds also for the infinite system because, also in this 
system, 7 and j refer only to a finite number n of individuals. 

d. m(h . i) = m(h.7’). 

Proof. The two conjunctions are isomorphic, because the second is con- 
structed out of the first by that correlation which leads from i to i’ and leaves 
all in not occurring in ¢ unchanged. Hence the assertion (Tot-2a). 

e. m(h . j) = d X m(h i). 

Proof. Letj bei: V ia V . . . V ia. Then 4 «j is L-equivalent (by distribution, 
T21-5m(2)) to (ha i:) V (h. i2) V . . . V (h » ia). The components in the latter 
disjunction are L-exclusive in pairs (T26-5b) and have the same m-value (d). 
Hence the assertion (T57-1n), in analogy to (c). 

f. c(i,e) = c(z’,e). (From (d).) 

g c(j,e) = d X c(i,e). (From (e).) 

h. c(h,i) = c(h,’). (From (b), (d).) 

+i. c(k,j) = c(h,i). (From (c), (e).) 
j. m(j) = zri X m(i). (From (c), (a).) 
+k. Let e be not L-false and not contain any in occurring in 7. Then 
c(j,e) zd ares x c(i,e). (From (g), (a).) 


(i) is of special importance. It shows that for a symmetrical ¢ of a hy- 
pothesis referring to other individuals than the evidence it makes no dif- 
ference whether the evidence is an individual distribution or merely a 
statistical distribution; in other words, even if the evidence specifies the 
individuals, only their numbers are relevant for c. ((i) may be regarded as 
a special case of the more general theorem Ts59-3c.) 


792-3. Let ‘M,’,..., ‘Mp form a division. Let i be an individual dis- 
tribution for s individual constants with respect to the given division with 
the cardinal numbers sz, $2, . . . , Sp; likewise 7’ for s’ individual constants, 
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which do not occur in 2, with the cardinal numbers sj, s4, . . . , sh. Let j be 
the statistical distribution corresponding to 7, and j’ that corresponding 
to i’. It is clear that 7.7’ differs at most in the order of the conjunctive 
components from an individual distribution for the s + s’ in with respect 
to the given division with the cardinal numbers s: + si,..., Sp + 53; 
let e be the corresponding statistical distribution. Then the following 
holds. (Here again, the assertion for lo can be proved with the help of 
T57-5 or 6.) 

a. Lemma. The number of disjunctive components in 7 is ; 


q s'i . s +s! 
J! araia in e ppe D a Fp: (From Tra.) 


(s+ s’)! = ar 
b. m(e) = qa Gro A m(i. i’). 


sl . 
nls M 


(From (a), in analogy to Trj.) 


c. Lemma. f j.j D e. 
Proof. j «j' says that, for every m (from 1 to $), Sm of the first s individuals 
are Mm and 4», of the second s’ individuals are Mm. From this it follows that of 
the total of s +s’ individuals Sm + Sm are Mm. This is what e says. 


d. Lemma. fe«j 2 j’. 
Proof. e says that, for every m, of the total of s + s’ individuals sm + share 
Mm. j says that of the first s individuals sm are Mn. Therefore e «j entails that 
of the second s’ individuals, since they are distinct from the first ones, Sm are 


Mm. This is what 7’ says. 


e. Lemma. }e+j7 = j.j’. (From (c), (d).) 
f. Lemma. þe.i=i.J'. j 
Proof. 1. }i Dj (T26-5c). Therefore, with (d), feni Diaz’. 2. Since 
Hi a PaCo Hence, with (c), Hi sJ’ D e.t. 


feo ut Gags 
g. Lemma. m(e.i) = m(é«j') = gata X mhii). 
tion follows from (f). Since i and j’ have no in in com- 


Proof. The first equa’ ETa wy cyi ty 
ma He ect equation follows from Tze (with ‘i’ for W, j” for F’, and 


ʻi” for ‘’) and (a). 
h. (1) m(j«f’) = sat X mG j’), 


(2) = aati X gaip X m(t’). 


aal. sl p 
Proof. Since j and j’ have no in in common, (1) follows from Tre (with 7” 


for ‘#’) and (a). (2) from (8). 
H (9 ee) = gi a X etapat, (Prom (0), 0.) 


(2) = ernia tin, (From (1), D40-3.) 


sl si HDHD HSI 
+i. (a) Ge) = maton % Hale x +s) 
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Proof. m(e.j) = m(j«j’) (from (e)). Therefore ¢(j,e) = mC »j')/m(e). 
Hence, with (h)(2) and (b), the assertion. 


(2) = armiha X e(i,e). (From (x), (i)(1).) 
Ot ek ee Ch) eG) 
om) 


(We shall often state the value of a function in several forms, marked 
by figures under the same letter. The difference between them is often 
(as here with (1) and (2) under (i) and (1) and (3) under (j)) merely one 
of mathematical notation. Listing of various forms is often convenient 
for reference in proofs of later theorems.) 

T3i and j are the first theorems in our theory which state values of c (ex- 
cept for the trivial values o and 1) absolutely, so to speak, that is to say, 
not only in relation to other c-values but in such a manner that they show 
how actually to compute the values for given sentences. Theorems of this 
kind are possible at the present stage of our construction, i.e., in the theory 
of symmetrical ¢-functions, only for those particular cases where the in- 
dividuals referred to in the hypothesis occur already in the evidence, as is 
the case in T3i and j. T3 will be used in § 94 for the direct inductive in- 
ference. 

Comparing the values T3i(2) and j(3), we find a striking similarity. 
This is the first example of a relationship which we shall find in many 
theorems later: if z is an individual distribution and j the corresponding 
statistical distribution, then often c(i,e) is entirely expressed in terms of 
the [ ]-function and ¢(j,e) in terms of the ()-function with the same argu- 
ments. This was our reason for the choice of the notation with ‘[]’ in 
analogy to the customary notation with “()’. 


§ 94. The Direct Inductive Inference 


The direct inference is that from the population to a sample. e states the 
absolute frequency (af) n: of a property M in a population of n individuals; 
hence the relative frequency (rf) isr: = ni/n. The hypothesis žst states the rf of M 
in a sample. It is found that the c of hst has its greatest value when the rf of M 
in the sample is equal, or as near as possible, to the rf in the population. If the 
hypothesis says that a certain individual of the population is M, then its ¢ is ri. 
This shows a close connection between c (probability;) and the rf in the popula- 
tion (probability). The results here, in distinction to our later results con- 
cerning other kinds of inductive inference, are in agreement with those of the 
classical theory of probability. 


The remainder of this chapter deals with the direct inference, which is 
one of the principal kinds of inductive inference (§ 44B). This part of in- 
ductive logic is located in this chapter because, in distinction to the theo- 
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ries of the other kinds of inductive inference, it does not require the choice 
of a particular c-function but holds for all symmetrical c-functions. 

The direct (or internal or downward) inference is the inference from 
the population to a sample. Example: the evidence e says that among the 
three million inhabitants of Chicago 80 per cent are born in America. A 
sample of fifty persons is given; nothing is known about these persons 
except that they are inhabitants of Chicago. The hypothesis k may say 
that among these fifty persons forty (or forty-three, or between thirty- 
seven and forty-three, or less than thirty) will be found to be born in 
America. c(%,e) is to be determined. 

We shall now formulate this kind of inference in general terms. We 
consider a population of # individuals. This population is supposed to be 
given by enumeration, that is, by a list of in. A division consisting of p 
properties M,, Mz, .. ., Mp is given. Let the evidence e say that of these 
n individuals n, have the property M,, na have M.,...,m» have M,. 
For any å (from rto p), ns is called the absolute frequency (af) of M; in the 
population. The relative frequency (rf) of M; in the population is na 
n;/n. e states only these frequencies of M; without specifying which n; in- 
dividuals are M;. Thus e is not an individual distribution but a statistical 
distribution. Our problem of the direct inference refers to a sample from 
the population containing s individuals. The given evidence e says noth- 
ing about their properties; the individuals are merely given by a list of 
individual constants, and from that it is seen that they belong to the 
population. The hypothesis 4 says that s: specified individuals of this 
sample are M;, Sa are Ma, ..., Sp are Mp. Thus, h is an individual dis- 
tribution. Furthermore, we consider the corresponding statistical hy- 
pothesis /ig: which states for each M; the same frequency s; as h but does 
not specify which ss individuals are M;. The problem of the direct infer- 
ence is that of determining the c-values of h and especially Tis on e. The 
solution of this problem for all symmetrical ¢-functions is given by the 
subsequent theorem Tr which is a simple consequence of earlier re- 
sul 

Wee here and further on by a sample simply a subclass of the 
population without any qualification. It is customary in the statistical 
theory of samples to regard a statistical inference with respect toa TS 
sample as valid only if this sample is a random sample. A sample is calle 


a random sample if it has been selected from the population by a procedure 


i «dividuals of the population have the same prob- 
of such a kind that all individu: pop eet n 


a i ; io 
ability, of being selected, that is, such that in a pro ; 
the rete all individuals will be selected with equal frequencies. (For 
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original formulations of the definition see Venn [Logic], chap. v, and C. S. 
Peirce [Theory], p. 454; for a modern formulation, e.g., Cramér [Sta- 
tistics], p. 324.) The fact that this theory uses the concept of probability, 
instead of probability, has the effect that we can hardly ever know wheth- 
er a sample that we have selected from a population is a random sample 
in the defined sense. This was correctly pointed out by Keynes ([Probab.], 
pp. 290 ff.). The validity of any inductive inference, even in practical ap- 
plication, does not depend on the actual state of affairs, and certainly not 
on any unknown frequencies; it depends merely on the given knowledge 
situation or, more exactly speaking, on the logical relations between the 
given evidence and the hypothesis. The individuals of the sample must 


not be known to have any common property beyond what is said about 


them in the evidence, or at least not any property that is relevant for the 
hypothesis in question. This requirement, however, need not be mentioned 


in any theorem of inductive logic as a qualifying condition. It concerns — 


not the theorem but its practical application to given knowledge situa- 
tions; for this purpose, however, it is already implied by our requirement 
of total evidence (§ 45B) : the evidence must state all observational knowl- 
edge that is actually available; in other words, a theorem is not directly 
applicable to a knowledge situation in which the available observational 
knowledge contains more than the evidence described in the theorem. 
The main difficulty in the practical application of inductive theorems 
consists in the fact that actual knowledge situations contain much more 
than any of the simple evidences referred to in theorems. A theorem can 
nevertheless be applied indirectly, provided the additional knowledge is, 


at least approximately, irrelevant for the hypothesis in question. The re- 


quirement that a random method be chosen for the selection of the sample 
is not a rule of inductive logic but a methodological rule (§ 44A) intended 
to assure the irrelevance of the known common property of the individu- 
als of the sample for the hypothesis in question. The procedure used for 


selecting the sample usually determines such a common property. For — 


example, suppose that the hypothesis concerns the distribution of political 
opinions among the students at a certain university. If we take as sample 
those students who major in history, then this fact constitutes a common 
property which, taken together with previous experiences concerning 
political opinions of history students, is not irrelevant for the hypothesis 
in question. If, on the other hand, the sample is selected by a blind draw- 
ing of lots, then the common property of being selected by this procedure 


is irrelevant. Irrelevance is here meant, of course, not in the frequency ~ 


sense but in the inductive sense as discussed earlier (§ 65). 
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Although theorems like Tz do not involve the concept of randomness, 
this concept is nevertheless of great importance for inductive logic, es- 
pecially in application to random distributions and random order, Ran- 
domness in this sense is the opposite to uniformity. Both concepts will be 
discussed in Volume IT. 


-+-T94-1. Let £ be any finite or infinite system. Let the predicates 
‘M? (i = 1 to p) form a division. Let e be a statistical distribution for n 
given in (in £) with respect to the division, with the cardinal number n; for 
‘M;’. Let r: = n/n. Let h be an individual distribution for s of the n in 
in e (hence s < n) with the cardinal number si for ‘M,’. Let s; S mi, be- 
cause otherwise obviously | e D ~k and hence c(h,e) = o. Let hy be the 
statistical distribution corresponding to /. Let ¢ be any symmetrical c- 
function for &. Then the following holds. 


a. (1) (hye) = Gane eam tlle 


o -A 
["] 
E See cia o or i. “pi gi for 
Proof. From T92-3i by substituting ® for ‘i’, ‘n — s for s”, ‘ni — Si 
‘se Chis means that we apply the sentence t in the earlier theorem to the 
sample and #’ to the remainder of the population.) 


b. (1) c(l) = sia X elhe); 


(2) -00 


Proof. From T92-3), 


l, 
EI 


by the same substitutions as for (a), and ‘ha’ for vin 


c. Let p = 2; hence the division consists simply of M, and M: (which 
is non-M;,). We consider the values of ¢(ts€) for constant s when Sı 
runs through those values from o to s which are possible on evidence 
e (that is, ss £ ^: and, if na < 5, 5: zs- m). Let ‘g’ be short for 
(s + 1)(n: +/+ 2)’. Then the following holds. i} Re 
(x) If the interval from g — 1 to g contains only one integer whict 

is a possible value of Sn Say s*, then c(Mst,¢) has its only maxi- 
mio SS ible values of sn then 
(2) If both q and q — 1 are integers and possible values of s:, 


ima for S: = g and s: = q — TI. 

c(hs,e) has two equal maxima : ; 

(3) - we is an integer, then c(/s,¢) has its only maximum for 
Si = Str. 

Proof. Let hg be like hs but Wi 

‘M’ and hence s2 + 1 instead of 52 

(A) c(lenye)/c(lgue) = (F 1)(m: — S: 


th the cardinal numbers, — 1 instead of $1 for 
for ‘~M’. Then (b(2)): 
+ 1)/sr(a — $2) - 
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This value decreases with increasing s;. It is 1 for s: = q. 1. If g is not an 
integer, let g’ be the greatest integer smaller than g. Then the quotient (A) for 
Ss: = q' +1 is <r because g' + 1> g. Therefore the ¢ for s: = g' +1 is 
smaller than for s: = g’; and likewise the c for any s: > g’ + 1. On the other 
hand, the quotient (A) for s: = g'is >1 because g’ < q. Therefore the ¢ for 
$s: = q' — 1, and likewise for any smaller s,, is smaller than for sı = g’, Thus ¢ 
has its only maximum for sı = g’. 2. If q is an integer and q and q — 1 are pos- 
sible values of s;, then cis equal for these values because (A) is 1 for s: = q, and 
has its maxima for them. 3. It can easily be seen that g — 1 < r,s < g. Hence 
the assertion (3) from (1). 


d. Let p = 2. For any m (from o to s), let Am be the statistical distribu- 
tion / with s= m and sa = s — m; s remains unchanged. Then 
Dim X ¢(hm,e)] = sr. (For m = o, the product is o; for those values 


of sı which are not possible on e, ¢ = o. Therefore, the sum may be 
restricted to the positive possible values of s,.) 


Proof. According to b(2) for p = 2, the sum mentioned is >. [o(*)(.™,)]/ 
(3). Since m(w) = m (MII) (T4o-8e), the numerator is, with | = m — r: 
Mr Dler) (i) ], = m: (321) (T4o-9c). ‘Therefore the quotient is 

d 


nln — 1)! s! (n — s)!/(s — 1)!(m — s)!n! (Dgo-2a), = sni/n = sr. 


» Corollary. Let p = 2,5 = s, = 1,5, = 0; hence hs is the same as h. 

Then c(h,e) = n:/n = r,. (From (a(2)), T40-13a and b.) 

Tza(z) and b(2) show again the analogy explained at the end of § 92. 

Tic says that the c for / has a maximum if the rf (relative frequency) 
of M in the sample is equal, or as near as possible, to the rf in the popula- 
tion. Note that this holds only for the statistical distribution 4, not for 
the individual distribution /; this will be explained in connection with the 
subsequent examples (§ 95). 

We shall explain later (§ 99) that the estimate of a magnitude (in the 
sense of the c-mean estimate) is the sum of its possible values each multi- 
plied with the c for the hypothesis stating that value. Therefore the esti- 
mate of the af (absolute frequency) s, of M in the sample, with respect to 
the evidence e stating the rf of M in the population as r,, is given by the 
sum mentioned in Trd. Thus we see from Trd that the estimate of s: on € 
is sr;. Therefore the estimate of the rf in the sample is sr,/s, that is, r:, hence 
equal to the rf in the population. We have seen (Tıc) that the most prob- 
able value of s, (i.e., that with the maximum ¢) is either sr, or, if this is 
not an integer, it is an integer nearest to sr,. The estimate of s,, however, 
is always exactly sr, even if this is not an integer. (The estimate of a mag- 
nitude need not be one of its possible values; this will be explained later.) 
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Tre says that, for the hypothesis that a given individual belonging to 
the population is M, cis 7;, that is, the rf of M in the population. For ex- 
ample, if the evidence e says that four-fifths of the inhabitants of Chicago , 
are M, then the c or, in other words, the probability that an inhabitant of 
Chicago taken at random is M is four-fifths. Many, especially those ac- 
customed to the frequency conception of probability, will perhaps be 
tempted to say that this statement is quite trivial because by the prob- 
ability of a Chicagoan being M we mean just the rf of M among Chicago- 
ans. However, this judgment about the theorem is based upon a confusion 
of the two concepts of probability; and the theorem is, in fact, very im- 
portant and far from trivial. It is true that the probability, of a Chicagoan 
being M means the rf of M among Chicagoans. The theorem Tre, how- 
ever, does not speak about probability, but probability, that is, ¢. It says, 
applied to the present example, that the probability, of a Chicagoan being 
M with respect to a given evidence e is equal (not, as the probability,, to 
the actual rf, irrespective of whether anybody knows it or not, but) to 
that value of the rf which is stated in the evidence e (irrespective of wheth- 
er this is the actual value or not). Thus we see that, in the situation of the 
direct inference, there is a very close connection between the rf of M in 
the population as stated in e and the c for a full sentence of M, in other 
words, between a known value of probability, and the value of proba- 
bility,. It was earlier mentioned (§ 42A (1)), that this connection makes 
the historical shift in the meaning of the word ‘probability’ understand- 
able; this word had first only the sense of probability, and later was used 
in certain contexts in the sense of probability,. 

There can be no doubt that the direct inference is genuinely inductive 
in our sense, that is, nondeductive; it is obviously impossible to deduce the 
frequency in the sample from that in the population. (The fact that some 
authors call it a deductive probability inference is due, it seems to me, not 
to a difference in opinion, but merely to a difference in terminology.) 
Nevertheless, the direct inference is fundamentally different from the 
other inductive inferences in this point: all the individuals to which the 
hypothesis refers occur already in the evidence. In consequence of this, 
the direct inference is in certain respects more similar to the deductive 
inference than the other inductive inferences. First, the direct inference 
holds for all symmetrical c-functions alike. Thus it presupposes only that 
all individuals are treated on a par, but it is independent of the choice of a 
particular measure function. Further, it is independent of that charac- 
teristic of the property M which we call its logical width (§ 32). We shall 
later see that for all the other kinds of inductive inference the value of ¢ 
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depends upon the choice of the c-function and hence of the underlying ù, 
and that for at least some c-functions, among them our function c*, the 
value depends also upon the logical width of M. The theory of the other in- 
ferences cannot be developed in a general form. Therefore, we postpone it 
after the definition of c*, and then we shall construct it for this function. 

The results stated in Tx are in agreement with those given in the tradi- 
tional calculus of probability based upon the classical conception, al- 
though the results are usually interpreted in a different way. This agree- 
ment is a consequence of the fact that these results are independent of the 
m-function and independent of the width of M. (We shall see later that 
the results to which our system of inductive logic leads in the case of all 
other inductive inferences differ from the traditional results; the latter are 
valid only in certain special cases, for instance, as approximations in the 
case of sufficiently large samples or for certain kinds of properties.) 

For the direct inference and for most of the other inductive inferences 
with the exception of the universal inference, the values of ¢ are inde- 
pendent of the total number N of individuals in the universe (though not 
of the number n of individuals in the population, which is here regarded 
as part of the universe). Therefore, the results hold for any finite or in- 
finite system . ; 

A numerical example for Tx will be given at the end of the next section. 


§ 95. The Binomial Law 


From the earlier theorem on the direct inference the binomial law in its 
classical form can be derived. It holds as an approximation for a large finite 
population. It holds further exactly either as limiting value for an infinite se- 
quence of increasing populations or for an infinite population. Some traditional 
uses of it are not admissible. Numerical examples for the theorems in this and 
the preceding sections are given. 


Because of the agreement between our theory of symmetrical c-func- 
tions and the traditional method, as far as the direct inference is con- 
cerned, we may follow the traditional method for some further steps, 
which are of a purely mathematical nature. These steps lead to certain 
results which hold as approximations for a sufficiently large population. 
These results are stated in the theorems of this and the next sections, lead- 
ing to the famous Bernoulli theorem. 


+T95-1. 


Let £, ‘M; (i= 1 to p), e, n, ni ri, h, S, Si, Ie, and c be as in To4-1- 
Then the subsequent results hold in the following two senses (A) and (B). 
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A. If n is very large in relation to s, and likewise n; to s; for every 7, 
then the results hold approximately. 

B. If for n —> ©, for every t, the rf n;/n approaches to a limit 7;, then 
the c-values stated are the limits to which c approaches. 

a. c(h,e) = rr"r."... rp”. (From To4-1a(2), T40-174(3).) 

b. (Mgt) = amir ra" - . rp”. (From T94-rb(z), (a).) 
+c. Binomial Law. For p = 2: 

c(hsse) = (5,)rx ra”. (From (b).) 

d. As To4-1¢, but with g = (s + 1)r:. (From T94-1¢, for n > œ.) 

(d) says here, as in the earlier theorem, that c(hs,e) has its maximum 
when s: is equal, or the nearest integer, to srx, hence when the rf of M in 
the sample is equal, or as close as possible, to that in the population. 

The binomial law of probability bears this name because of its rela- 
tionship to the binomial theorem (T4o-10a). Our formulation of it (Trc) 
as a special case of the general law Txb refers to an evidence stating the 
rf 7, of a property M in a population to which the sample in question 
belongs. The traditional formulation does not refer to this rf; it speaks of 
rı rather as the probability of an individual’s being M; a restricting con- 
dition is usually added to the effect that rz must be the probability of M 
for each individual in the sample or each trial in the series of experiments, 
“independently” of the other individuals. This independence is meant in 
the sense that, even after some of the individuals have been observed, the 
probability for any other one is still zı As Keynes has pointed out cor- 
rectly ([Probab.], pp. 342 £.), this condition is very seldom fulfilled. If we 
deal with a fixed population which is finite but very large, then the condi- 
tion cannot be fulfilled exactly; and, although it is fulfilled with good ap- 
proximation as long as the sample is small in relation to the population, it 
is in general not even approximately fulfilled for large samples. The con- 
dition is exactly fulfilled and therefore the theorem holds exactly for any 
sample size s, if the population is infinite (see below) or if, after each ob- 
servation of an event, the situation is rearranged so as to be like the origi- 
nal one. In this case, we have to do, strictly speaking, with a series of 
similar populations instead of one population. Consider the familiar ex- 
ample of an urn containing ” balls of which n, are known to have the color 
M and n, non-M. A ball is drawn at random; its property is observed; 
then the ball is replaced, the content of the urn is mixed, and then again 
a ball is drawn, etc. The situation before the second drawing is similar to 
that before the first in this essential point: we know again that there is a 
population of n individuals of which #, are M and n, are non-M. There- 
fore, the probability (c) for the second drawing yielding a ball that is M 
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is again 2,/n = r,; thus the condition for the binomial theorem is exactly 
fulfilled. Strictly speaking, the population for the second drawing is not 
the same as that for the first. What we usually call the same ball a, at 
the first and at the second time-point are, strictly speaking, two different 
individuals az: and azı which stand in that relation which we may call 
‘genidentity’ (Kurt Lewin); this is an empirical relation established bya 
continuity of observation. We know from experience that under usual cir- 
cumstances two genidentical ball-moments have the same color; and 
therefore we assume that the frequency of M at the second time-point is 
the same as at the first. However, this is obviously an inductive result 

- from many previous experiences. In order to be independent of these 
earlier observations and to obtain a pure case for the binomial theorem, 
we ought to count the number of balls with M and non-M again after 
replacing the ball. This shows that the experiment described with replace- 
ment of each ball drawn is not essentially different from an experiment in 
which we take each time a new urn with new balls but such that the same 
numbers n, #;, and n, hold for each urn. The experiment with replace- 
ment is, strictly speaking, likewise an experiment with a series of popula- 
tions; the procedure of replacement is merely a convenient technical de- 
vice to assure the constancy of the frequencies without the need of new 
balls and repeated countings. 

The theorem Tr is formulated only for finite populations (even (B) re- 
fers only to an infinite sequence of finite populations, not to an infinite 
population). But it holds likewise for an infinite population Ko, that is, 
for the class of all individuals in £e or an infinite subclass of it. In this 
case, r; is to be defined, not as a quotient, but as the limit of n;/n for 
n— œ with respect to a given serial order of the elements of Ka. In our 
language system Qo there is no sentence e saying that the limit of rf is r;; 
therefore, we had to formulate our theorems for finite populations. [In 
the metalanguage, however, variables for natural numbers and real num- 
bers are available and hence the limit concept is expressible; we use it 
often in definitions (e.g., for «c in D56-2) and theorems (e.g., here T1(B)).] 
If a stronger language system is chosen which contains natural number 
variables (for instance, the system % described in § 15B), then the limit 
statement can be formulated in it (it is usually formulated with real num- 
ber variables, but natural number variables are sufficient). The definition 
of symmetrical c-functions can easily be adapted to this stronger system 
(see the indications given in § 15B concerning form I of an extended in- 
ductive logic). Then the binomial law and Bernoulli’s theorem can be 
proved with respect to an evidence e which says that in the infinite popu- 
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lation the limit of the rf of M; is r; (in the case of 2’, with respect to the 
basic order of the individuals). In this case, the ¢-value stated by the 
binomial law holds exactly. 

However, in the case of an infinite population the binomial law—and 
the same holds for Bernoulli’s theorem—involves a certain difficulty con- 
cerning not its theoretical validity but rather the possibility of its ap- 
plication. We have seen that if e says that the limit of rf of M is r,, then c 
has the value given by the theorem. Now, in order to apply any inductive 
theorem to a given knowledge situation, e must represent the knowledge 
available. Let us leave aside here the difficulty connected with the re- 
quirement of total evidence (§ 45B), because this difficulty is not specific 
to the present problem but is found in all applications of inductive logic. 
In other words, let us assume that all observational results about other 
things which the observer X may have are irrelevant for ks and hence 
may be omitted. Then the question remains as to how X can ever possess 
evidence stating the exact value of the limit of rf of M. The question I am 
raising here is not meant in the sense of the assertion that statements con- 
cerning infinite sequences of events are meaningless because it is not pos- 
sible to know anything about an infinite sequence and, least of all, about a 
limiting value for it. This assertion has sometimes been made as an objec- 
tion against the frequency theory of probability, because this theory ex- 
plicates probability, by the limit of the rf in an infinite sequence. I agree 
with the frequentists in the view that the assertion is'too strong. If we 
were to require for knowledge absolute certainty, then we would have to 
give up all claim to knowledge in science. If, consequently, we admit also 
knowledge short of certainty, then we may admit hypotheses concerning 
limits of rf just as well as other hypotheses with the same degree of logical 
complexity (cf. [Testability] §§ 25 f.); and then we may try to confirm 
hypotheses of this kind by observations. A statement concerning the limit 
of rf is a confirmable hypothesis and hence empirically meaningful. How- 
ever this does not solve our present difficulty. Although the limit state- 
ment can be indirectly confirmed by observational evidence, it is hardly 
possible to imagine a situation in which the limit statement itself formu- 
lates the observational evidence available to an observer X. Thus the bi- 
nomial theorem for an infinite population, although theoretically valid, 
can hardly ever be applied directly to a given knowledge situation. In- 
direct application may still be possible, as for any other inductive theo- 
rem; that means it may help in establishing other theorems which in turn 
are directly applicable. 

The binomial theorem has, since classical times, often been applied to 
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situations of the following kind, which are fundamentally different from 
those referred to in our formulation Tıc. The latter presupposes that r, 
the rf of M in the population, is given in the evidence e. Now the tradi- 
tional way of reasoning is as follows: if the rf of M in the population is 
not known, then we may take instead the rf’of M in a past series of ob- 
servations. For example, suppose that we want to determine the prob- 
ability for the hypothesis /, that in the next hundred throws with this die 
there will be twenty aces. We do, of course, not know the rf of aces in any 
population of which the next hundred throws constitute a sample. But 
suppose we have earlier made two hundred throws and found among 
them twenty-four aces, hence an rf of 0.12. Then the traditional procedure 
consists in taking this value as the ‘probability’ r, to be used in the bi- 
nomial law and hence to give as c for Mg: (30°) X (0.12)?° X (0.88)*. 
This procedure is not admitted by the above formulation of the binomial 
law (Tıc), and I think it is incorrect. Inductive logic must, of course, also 
provide a solution for this problem where the evidence states statistical 
results concerning past events while the hypothesis states the rf for a series 
of future events. Since the class described in e does not comprehend the 
class described in /, this inductive inference is not from the population to 
a sample but from one sample to another not overlapping with the first. 
Therefore, it is not a case of direct inference but rather of predictive in- 
ference. In a later chapter (in Vol. II) we shall deal with this kind of infer- 
ence and then give also a solution to the above problem (cf. § 110C in 
` this volume). We shall see that here the binomial formula does not in gen- 
eral hold but only as an approximation under certain restricting conditions 
which are even stronger than those for Tı and which are not fulfilled in 
the example mentioned. 


Numerical Examples for the Direct Inference. 

First Example, for To4-1, with p = 2, ‘M’ and ‘~M’. We consider a sample 
with s = 7 from a small population: n = 14, ñ: = 10; hence m2 = 4, 71 = 5/7, 
fa = 2/7. We consider all possible values for sı. Since the population contains 4 
individuals with non-M, the sample cannot contain more; hence sa $ 4, S: 2 3. 


Let the population contain the individuals a;, a2, .. . , dx4, and the sample 
the individuals a;, . . . , a7. his the prediction that s: specified individuals of the 
sample, say @:,..., @s,, ate M and the others non-M. hss is the weaker predic- 


tion that exactly s, of the seven mentioned individuals of the sample are M, no 
matter which they are. According to To4-1a(r), c(h,e) = 7!10!4!/(10 — s,)! 
(sı — 3)!14!; according to To4-1b(1), c(Ast,e) = G) X c(h,e). This yields the 
valuesstated in the subsequent table. We see that, in accordance with T94-1¢(3), 
c for hst has its only maximum for s; = sr; = 5, that i is, for the case in which the 
rf of M in the sample is the same as that in the population. 

Second Example, for To5-1. We take p = 2 ands = nas in the first ex- 
ample. We make no specific assumption concerning the size of the population; 
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we presuppose merely that it is very large in relation to the sample (say, at 
least n = 500). We take, as in the first example, 7: = 5/7. If the population is 
infinite, this means that the limit of the rf of M with respect to a given serial 
order of the individuals is 5/7; in this case, the computed values for ¢ hold 
exactly. If the population has a finite size n, it means that the rf of M (n,/n) 
is 5/7; in this case the values of ¢ hold approximately. According to T95-1a, 
c(h,e) = (5/7)" X (2/7)?-*. In this example, in distinction to the first, all 
values of s: from o to 7 are possible. The results are shown in the following 
table. According to Tọo5-1c, the c-values for Jst are again found by multiplying 
with (7,.). These values are shown in the last column of the table. 


First Example: n=14 SeconD EXAMPLE: LARGE # 
S: 
c(h, (Binomial Law) 
(hye) Chae) (he) Ome) 
° ° o 0.000155 0.000155 
I o o 0.000388 0.00272 
2 o o 0.000971 0.0204 
3 0.00100 0.0350 0.00243 0.0850 
4 0.00700 0.245 0.00607 0.212 
5 0.0210 0.441 0.0152 0.319 
6 0.0350 0.245 0.0380 0. 266 
7 0.0350 0.0350 0.0948 0.0948 
1.001 1.000 


In the second example we find again, in accordance with To4-1d, that c for her 
has its maximum for sı = sr: = 5. Note that this holds only for the statistical 
distribution Ast, not for the individual distribution 4. For the latter, in the case 
p = 2, c increases always with increasing $: and has its maximum when all in- 
dividuals in the sample are M. This holds always if 7, > 1/2; ifr: < 1/2, the 
maximum holds for s: = 0; if r: = 1/2, ¢ is the same for all values of s1. These 
results are plausible. If we take sı + 1 instead of s: and hence sz — 1 instead 
of sa, then the new value of c(h,e) (To5-ra with p = 2) is r,/r2 times the earlier 
one; and this ratio is >t if 7x > 1/2, In the second example, r:/r2 = 5/2. And 
this is indeed the ratio of each value in the column for c(h,e) to the preceding 
value. (In the first example the situation is not so simple because of the small 
size of the population. When six individuals have been found to be M, then we 
see from e that among the remaining eight there are four with M and four with 
non-M. Therefore it is equally probable for the seventh individual to be M or 
non-M. This is the reason why ¢(/,e) has the same value for sı = 6 and s: = 7)- 


Bernoulli’s Theorem 


Bernoulli’s theorem in its classical form holds as an approximation for the 
direct inference, if the sample is large and the population still much larger or 
even infinite. Bernoulli’s limit theorem says that the c for the assumption that 
the relative frequency of a property M in a sample lies within a fixed interval, 
however small, around its relative frequency in the population can be brought 
as close to 1 as required by making the sample sufficiently large. 
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We shall now follow still further the classical method as far as the 
mathematical transformations are concerned, though not necessarily in 
the interpretation. These transformations, due to Jacob Bernoulli and 
Laplace, lead to the famous and important results stated in the following 
theorem Tr, which often are collectively called Bernoulli’s theorem (some- 
times even including the binomial law); sometimes this name is applied 
in a more specific sense to Tre. Our interpretation of these results and the 
details of our formulations in Tx are determined by the fact that we locate 
the results into the framework of the theory of symmetrical c-functions 
as special cases of the direct inference. (We omit here the details of the 
mathematical transformations, because they have no bearing on the in- 
terpretation and can be found in most textbooks on probability.) 


+T96-1. Bernoulli's Theorem. Let l, ‘M’ (for ‘My’, with ‘~M’ for 
‘M,’), e, Nn, Nx, Nas Px; Tay S, Sx, Sa, Nee, and c be as in To4-1 but with p= 2. 
For abbreviation, let ¢ = -/sr,r. (this is the standard deviation; cf. re- 
marks on T105-1); 6 = s, — sr; (this is the deviation of s, from its esti- 
mate sr,). Let % be a disjunction of sentences of the form hg, but with dif- 
ferent values of s,, running from s,,x to S:, (81,1 < 1,2). Let Ôr = S11 — Sr, 
and 6, = S:,2 — sr. Then the following approximations hold, provided 
that the sample size s and even srr, is sufficiently large and the popula- 
tion size n is very large in relation to s (and n, to sı, and n, to sa). 


a. The normal law, concerning a single frequency. 


clhe) = mei” = Ło (È). (The constants ‘x’ and ‘e have 
here their usual mathematical meaning, while the variable ‘e’ refers 
to the evidence. For the normal function ¢ see D40-4a and the table 
T 40-20.) 


Proof. From the binomial law (Tọo5-1c) with the help of Stirling’s formula 
(T40-4). The proof is given in the textbooks on probability or statistics. 


b. Concerning any frequency interval. Let s, in h run through all values 
(integers) from s,,: to s:,2, both included. Hence the disjunction % 
says that s: belongs to the closed interval (s+, 51,2). Then the fol- 
lowing holds. 


5, 
(2) ellie) =D A). 
(2) Rougher N 
c(h, e) ~ iff i $(t)dt = &(u,) — (u), where u: = (6, — 4)/o; 
uz = (6, + 4)/c. ‘$’ denotes the probability integral, see D4o- 
4b and the table T40-20. 
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(3) Still rougher approximation: 
as (2), but with w, = 6,/o, uz = 5,/c. 

((1) from (a); (2) from (x) by an approximative transformation 

which replaces a sum with a sufficient number of terms by an in- 

tegral; (3) holds as an approximation for (2), if the size of the inter- 
val, i.€., 51,2 — Srn is sufficiently large.) 

c. Concerning a symmetrical frequency interval. Let s: in h run from 
sr, — 6 (or the integer nearest to it) to sr, + ô. Hence h says that sı 
does not deviate from sr, by more than 6 to either side. 

(1) c(h,e) = 26(u) — 1, where u = (6 + 4)/c. (From (b)(2).) 

(2) Rougher approximation: as (1), but with u = 6/o. (From 
(b)(3).) 

(3) Alternative formulation for (2). Let k say that the rf of M in 
the sample is within the closed interval 7, + q, where g = 6/s. 
Then c is as in (1), but with u = qV's/r:r2. (From (2).) 

d. Let s: in Arun through all possible values (from o to s) which deviate 
from sr; by. 6 or more to either side. 

(1) c(h,e) = 2(1 — ®(u)), where u = (5 — $)/o. (From (c)(z).) 

(2) Rougher approximation: as (1), but with u = 6/c. (From 
(c)(2).) i 

(3) Alternative formulation for (2). Let k say that the rf of M in 
the sample deviates from r, by q or more to either side.. Then c is 
as in (1), but with u = gVs/rirz. (From (2).) 

Sometimes a table is given directly for the function 2(1 — ®(u)); see, e.g., 

Fry [Probab.], pp. 453-55, Cramér [Statistics], p. 558. 


e. Bernoulli's Limit Theorem. For a given r, and arbitrary positive 
real numbers g and e, there is a number s’ such that for every s 2 s’ 
c(h,e) > 1 — €, where h says (as in (c)(3)) that the rf of M in the 
sample of size s is within the interval 7, + g. In other words, if an 
interval, however small, around r, is chosen for the rf of M in a 
sample, c(#,e) can be brought as close to 1 as desired by making the 
sample sufficiently large. (From (c)(3)-) 


The approximative results stated in these theorems are independent of 
the size of the population; it is presupposed only that the population is 
very large in comparison to the sample. Only r,, the rf of M in the popula- 
tion, is relevant for the results; it enters through e = Vi (sr: — 73). 
The theorems are here formulated only for finite populations, for the rea- 
sons explained in the preceding section. The results hold likewise for an 
infinite population with r; as the limit of rf. But even in this case the re- 
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sults are only approximations, in distinction to the binomial theorem. 
The accuracy of the approximations increases with increasing sample 
size s; the requirement that not only s but also sr,r, must be sufficiently 
large was stated by Keynes ([Probab.], p. 339). Formulas with closer ap- 
proximations have been stated by Laplace and later authors; they are 
given in textbooks on probability. 

In the normal law (Tra), sr; (or the integer nearest to it) is that value 
of the af (absolute frequency) of M in the sample for which ¢ has its 
maximum (To94-r1c(3)). sr; is also the estimate-of the af of M in the 
sample. Therefore sı — sr; is the difference between that value of s, 
which is stated in the hypothesis % and the most probable value of s, (or 
the estimate of s:). The normal law states c as dependent merely upon the 
square of this difference and thus as equal for positive and negative devia- 
tions from sr;. Thus the approximation given in the normal law gives a 
distribution of ¢ which is symmetrical at both sides of the maximum, 
while the exact function (given for a finite population by To4-1b and for 
an infinite population by the binomial law T95-1c) is not symmetrical. 
However, the error made by the symmetrical function is small as long as s, 
is not too far away from sr; (cf. Keynes, pp. 338 f., 358-61). 

Tz deals with the case that the rf of M in the population is known as r+. 
It is presupposed that the sample size s is large and that the population 
is even much larger still. Tra answers questions of the form: what is the c 
for the assumption that s,, the af of M in the sample, has such and such a 
single value? The theorem gives this value of ¢ as a function of the dif- 
ference between s, and its most probable value sr.. On the other hand, 
Trb, c, and d answer questions, not concerning a single value of Sx, but 
intervals of such values. For large samples these questions concerning in- 
tervals are more important than those concerning single values. The con- 
tent of the theorems is best seen from some numerical examples. 


Example. Let us consider a sample of s = 2400 persons taken from the popu- 
lation of Chicago with n = 3,000,000. Let us suppose that it has been found in 
the census that the rf of a certain property M in this population is 7; = 0.6; 
hence r2 = 0.4. Thus, srit2 = 576; this is sufficiently large, as required. ¢ = 
~/(srir2) = 24. The estimate and the most probable value of the rf in the sample 
is the same as in the population: r; = 0.6; and the estimate and the most prob- 
able value of the af s: in the sample is sr; = 1440. 

Example for Tra. Let hss say that sı = 1452; hence 6 = 12; 8/6 = 0.5. Then 
Tra says that c for this hypothesis is (1/c)$(0.s). With the help of the table 
T40-20 we find that this is 0.352/24 = 0.0147. The same c-value holds for 
Sr = 1428. 


T1b refers to any interval for s,. If the interval contains only few values 
of s, we can determine ¢ according to (1) by a sum extending over the 
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possible cases. If, however, the interval is not small, this procedure be- 
comes too cumbersome, and the use of the probability integral is much 
more convenient. The form (2) gives a good approximation even for small 
intervals. If the interval is sufficiently large, the simpler form (3) may 
be used. 
Example for T1b. We take s, n, 71, r2, and as above. Let ô: = —12, 6. = 
24; hence the interval for s, is (1428, 1464). This interval contains 37 values. 
Therefore the use of Tıb(1) would be very cumbersome. Let us first use (2). 
uy = —12.5/24 = —0.52.P(ux) = 1 — (0.52) (T40-19h), = 1 — 0.698 (table 
T40-20) = 0.302. uz = 24.5/24 = 1.02. (u) = 0.846. Hence c = 0.544. Ac- 
cording to the rougher approximation (3), #: = —12/24= —0.5. ®(u:) = 
I — 0.691 = 0.309. U: = 1; (u2) = 0.841. Hence ¢ = 0.532. This value de- 
viates from that obtained by (2) by about 2 per cent. 

Trc is usually called Bernoull’s theorem, and we shall follow this usage. 
This is actually the form in which the theorem was formulated and proved 
by Laplace ([Théorie], in 1812) while Jacob Bernoulli himself ([Ars], pub- 
lished in 1713) stated the limit theorem (Tre). Tıc deals with an interval 
for the af of M, i.e., S which extends equally on both sides of sr, (forms 
(x) and (2)), or with an interval for the rf of M around r, (form (3)). 

Example for Trc. Let us take ô = 24. Thus h says that sı is somewhere in 
the interval 1440 + 24 (that is, between 1416 and 1464). According to (1), 
u = 24.5/24 = 1.02; (u) = 0.846; ¢ = 0.692. According to the rougher ap- 
proximation (2), w = 1; (u) = 0.841; ¢ = 0.682, which deviates from the 
value under (1) by 1.5 per cent. (3) leads, of course, to the same value as (2): 
q = 0.01; S/a = 10,000; V = 100; 4 = I. 

In Trd, h refers to the opposite case of Tıc; it says that s, is outside a 
given symmetrical interval around sr;. This theorem is often used when 
the statement e concerning the rf of M in the population is not actually 
known but either believed or merely considered and a sample is found 
which shows a surprisingly large deviation 6 from the most probable value 
of the af (i.e., sr;). Then the following question is raised: suppose we knew 
that the rf in the population were r+, how probable would it be for a sample 
to have the observed deviation ô or a larger one to either side? Trd gives 
the answer to this question. For example, we make a long series of throws 
with a coin which has not been previously examined. We observe heads in 
considerably more than half of the throws. Shall we take this as an indi- 
cation that the coin is loaded? Or could the coin be symmetrical and the 
surprising outcome merely a strange accident? In order to judge the latter 
case, we might be interested in determining the probability that a die 
which is symmetrical and hence yields heads in one-half of all throws 
would give a result like the one observed or deviating even more from the 


most probable result. This probability is determined by Txd. This theo- 
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rem is often used, in particular, by those statisticians who have no general 
concept of probability, and reject the inverse inductive inference but 

_ admit the direct inference and thereby Bernoulli’s theorem. For what we 
call the rf in the population they use the term ‘probability’ in the sense of 
probability. Their theory does not admit the question: ‘How probable is 
it, on the basis of the observed sample, that the rf in the population is 1/2, 
in other words, that the coin is symmetrical?’ Nevertheless they wish, of 
course, to make an inductive judgment on the hypothesis of the sym- 
metry of the coin on the basis of the observed sample of throws. As a sub- 
stitute for the rejected question, they judge the hypothesis by the prob- 
ability determined by Trd. If the probability of a result like the one ob- 
served or still further away from the expected rf 1/2 is very small, then 
they reject the hypothesis of symmetry. If this probability is large, the 
hypothesis may be accepted until further notice but need not be accepted. 
If the probability is neither very small nor large, judgment on the hy- 
pothesis is postponed until more observational results are available. This 
use of Trd is thus a weaker inductive method used as a substitute for 
those purposes for which a full inductive logic applies the inverse infer- 
ence, [See the later remarks (§ 98) on the methods of testing hypotheses 
developed by Fisher, Neyman, and others.] 


Example for Trd. Let us take ô = 72. This means that we ask for the prob- 

ability that s, lies outside the interval 1440 + 71. Since this interval is large, 

we may use rhe rougher approximation (2). u = 72/24 = 3; B(u) = 0.99865; 

¢ = 0.00260. This means that there is only a chance of about one-fourth of 1 

per cent for a sample having an s, which deviates from 1440 by 72 or more. 

‘The method mentioned above might use this result in the following way. If a 

sample with sı = 1512 and hence ô = 72 is found, the hypothesis that this is 

a random sample from a population with r, = 0.6 is rejected. (For instance, if 

someone draws 2400 balls from a large bag containing 60 per cent white balls 

and 40 per cent black balls and he finds 1512 white balls among the 2400 
drawn, the assumption that this result is merely due to chance is rejected.) 

Now let us go back to T'rc, the theorem for symmetrical intervals, and 

compare the c-values determined by it for two samples; this will lead to an 

important general result. For this comparison we do not need any specific 

knowledge about the values of ® and c but only the fact, obvious from the 

definition of , that (u) increases with increasing v and hence c increases 

with increasing ô. Within a population, where the rf of M is r,, we con- 

sider a first sample of size s and a second, larger sample of size s’ = sm? 

with m > 1. In both samples we consider intervals for S: around the 

most probable value of s,; let the interval size be +ô in the first sample 

and +ô’ in the second. We assume that both intervals are large enough so 

that we can use the rougher approximation T1c(2). In the first sample 


f 
4 


i 
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o = V(srir2); in the second sample, o’ = V (s'rır-) = mo. If now we 
choose the interval in the second sample m times as large as that in the 
first, that is, with 6’ = mò, then ô'/g' = ô/s; hence c has the same value 
in both samples. Thus we find the following results: 

(i) If the size of the interval for af in the second sample is m times 
that in the first, c remains unchanged. 

(ii) If the interval sizé increases by less than m, c decreases. This is the 
case in particular if the interval size for af remains the same. 

(iii) If the interval size for af increases by more than m, c increases. 
This is the case in particular if the interval for af increases by m?, hence 
in proportion to the sample, and therefore the interval for rf remains the 
same. This is the most important result: c for the same rf-interval in- 
creases with increasing sample. 

Example. We considered previously, in the example for Tıc, a sample with 
s = 2400, and within it the interval 1440 + 24 for the af of M; in other words, 
the interval 0.6 + o.or for the rf of M. Now we take a second sample with 
s’ = 9600; m? = 4; that is, this sample has four times the size of the first. The 
most probable value for af (s:) is here s'r: = 5760. o/ = ~/(s'rir2) = 48 = 20. 
We shall now examine three different intervals within this sample: 

(1) ò = 24 = ô. The interval for af is 5760 + 24; therefore the interval for 
rf is 0.6 + 0.0025. 

(2) 6! = 48 = 26. The interval for af is 5760 + 48, that for rf is 0.6 + 0.005. 

(3) 6’ = 96 = 46. The interval for af is 5760 + 96; its size is four times the 
earlier one, for a sample which has four times the earlier size; thus the in- 
terval has increased in proportion to the sample. The interval for rf is 
0.6 + 0.01; this interval is the same as that considered for the first sample. 

Now let us compare the c-value for the interval chosen in the first sample 
with that for the interval (2) in the second sample. Here we have 6’ = 25. We 
found earlier that o’ = 20. Hence 6’/c’ = 5/c. Therefore, according to Trc(2), 
c has the same value in these two cases. Since interval (1) is smaller than (2), 
c for (x) is smaller than for (2), and hence smaller than in the first sample. Since 
interval (3) is larger than (2), ¢ for (3) is greater than for (2), and hence greater 
than in the first sample. 

Let us now compute the ¢-values, according to Trc(2), although they are not 
required for the comparative results just found. Since o’ = 48, for the inter- 
val (1) 6'/o’ = 0.5; ® = 0,691; ¢ = 0.382. For the interval (2), 6/0’ = 1; 
© = 0.841; c = 0.682. For the interval (3), 6'/o’ = 2; ® = 0.9773; ¢ = 0.9546. © 


The last result we found under (iii) was this: If we take a fixed interval, 
however small, for rf around the most probable value r,, say, the interval 
rz + q, and go to larger and larger samples, then the c for the assumption 
that the rf of M in the sample lies in-this interval grows more and more. 
Now Bernoulli’s Limit Theorem Tre says that we can bring this c as close 
to 1 as we want to by taking a sufficiently large sample; in other words, 
that, with increasing s, ¢ converges toward the limit 1. (Note that the 
theorem, even in the latter formulation in terms of a limit, ‘speaks about 
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an infinite sequence of larger and larger finite samples, not about an in- 
finite sample.) 


Example for the limit theorem Tze. We take r, = 0.6 as in the earlier ex- 
amples so that the earlier results hold for the samples there considered. How- 
ever, we do not fix the size of the population beforehand. r, is either the limit 
of rf for M in an infinite population or the rf in a finite population of size n pro- 
vided that » is large enough to accommodate the sample size s which we shall 
determine. Let us consider a small interval around rs, say, 0.6 + 0.0025. We 
wish to determine how probable the assumption is that the rf of M in samples 
of various sizes lies within this interval. We found earlier (second sample, in- 
terval (1)) that, if the sample size s is 9600, ¢ for our assumption is 0.382. This 
is not a large probability; that is not surprising, since we chose a rather small 
interval for rf. Now the limit theorem says that, in spite of the smallness of the 
interval chosen, we can bring ¢ as close to 1 as we want to if only we take suffi- 
ciently large samples. Suppose we want ¢ for our assumption to be 20.999. 
Bernoulli’s theorem in its formulation for an rf-interval (Trc(3)) says this: if 
the interval for rf isr, + g, then c = 26(gx/(s/r,r2)) — 1. In order to make c = 
2® — 1 2 0.999, we must have 2® = 1.999, ® = 0.9995. We find in the table 
for ® (T40-20) that (3.3) = 0.9995; and ® increases with increasing argu- 
ment. Therefore, we must have gx/(s/rr3) = 3.3. Since g = 0.0025 and rira = 
0.24, $ must be 2418,280. This means that for a sample of size 418,280, ¢ = 
0.999; and for any larger sample, ¢ > 0.999. For instance, for a sample con- 
taining 500,000 individuals we find c = 0.99968; for one million individuals, 
c > 0.999999. (The population of three million considered originally is not 
large in relation to these samples; the results here obtained presuppose a much 
larger population.) 


This concludes our discussion of the direct inductive inference. The gen- 
eral theorems concerning this inference have first been given (§ 94), and 
then the classical formulas of the binomial law (§ 95) and of Bernoulli’s 
theorems (§ 96). The latter two are here construed as approximations for 
special cases of the direct inference. The other kinds of inductive inference 
will be dealt with in later chapters in Volume II (see the summary in 


§ 110). 


ae 


CHAPTER IX 
ESTIMATION 


| Besides the determination of the degree of confirmation, perhaps the most 
| important task of inductive logic is that of estimation, that is, of determining 
an estimate of the unknown value of a magnitude on the basis of given evi- 
dence. We propose to explicate the estimate as a mean of the possible values 
of the magnitude; not simply the arithmetic mean but rather a weighted mean 
with degree of confirmation as weight (§§ 98, 99). Consequently, we define the 
¢-mean estimate of a magnitude on the basis of the evidence e as the sum of 
the possible values of the magnitude, each value multiplied with the degree of 
confirmation for its occurrence (§ 100A). If an estimate (now always understood 
in the sense of c-mean estimate) for a magnitude on evidence e has been de- 
termined, the question arises how reliable it is, that is to say, how probable it 
is that its error, i.e., the difference between the estimated value of the magni- 
tude and its actual value, is small. We take as measure of the reliability (or, 
rather, unreliability) of the first estimate another estimate, viz., the estimate 
of the square of the error of the first estimate (§§ 102, 103). The discussions so 
far, constituting the first part of the chapter, deal with the problem of the 
estimation of any magnitude in general. These considerations apply to any 
language containing quantitative concepts, provided a concept of degree of 
confirmation for that language is available. 

The second part of the chapter refers to the language systems £ to which the 
theory of degree of confirmation developed in this book applies. The general 
concept of (c-mean) estimation, discussed in the first part in general terms, is 
here applied to the two chief quantitative magnitudes expressible in the sys- 
tems &, viz., absolute and relative frequency. This may be either the frequency 
of truth in a class of sentences or the frequency of a given property M in a class 
K of individuals (§ 104). In analogy to the earlier distinction between various 
kinds of inductive inference (§ 44B), we distinguish now the corresponding 
kinds of estimation. The direct and the predictive estimation of frequencies are 
dealt with in particular. The direct estimation applies to the case where the 
evidence e gives the frequency of M in a population and the estimate is made 
for its frequency in a sample taken from the population (§ 105). In the case 
of the predictive estimation, e states the frequency of M in one sample and the 
estimate is made for its frequency in a second sample not overlapping with the 
first (§§ 106, 107). 


§ 98. The Problem of Estimation 


The procedure of estimating the unknown value of a magnitude is an induc- 
tive procedure. Therefore, we shall try (in the subsequent sections) to base this 
procedure on the concept of degree of confirmation as the fundamental concept 
of inductive logic. Contemporary statisticians (especially R. A. Fisher, Ney- 
man, Pearson, Wald) have developed methods of estimation independent of the 
concept of degree of confirmation; but there is no agreement as to the logical 
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foundations and the validity of these methods. After rejecting the classical 
methods based on the principle of indifference, the statisticians seem to have 
given up the hope of finding an adequate explicatum for probability,; this was 
probably their chief reason for constructing independent methods of estimation. 


We have seen earlier (§§ 49-51) that the concept of an estimate of an 
unknown value of a magnitude plays an important role in the applica- 
tion of inductive logic for the rational determination of decisions. We also 
briefly indicated a way for defining the concept of estimate in terms of 
probability, (§ 41D (3)), without, however, discussing the technical de- 
tails. This led to a clearer understanding of the meaning of the logical 
concept of probability, and its relation to the statistical concept of prob- 
ability.. We found that probability, may, in certain cases, be interpreted 
as an estimate of probability, (§ 41D). In the present chapter we shall 
again take up the general problem of estimation and analyze it with the 
technical means developed in the intermediate chapters. We shall state 
in more exact terms the definition of a general estimate-function, called 
the c-mean estimate-function e, taking as the basis not the concept of 
probability, but that of a regular c-function as an explicatum of prob- 
ability,, Then we shall discuss, on the basis of this definition, various 
problems of estimation. In particular, we shall develop the theory of 
estimates of frequencies with respect to our systems &. 

Both in everyday life and in the practice of science, estimates are made 
of the unknown values of magnitudes. The treasury makes an estimate of 
the income to be expected from a new tax, a hostess makes a guess as to 
the number of guests who will come, a general estimates the strength of 
the forces the enemy has now or will have tomorrow at a certain place, a 
physicist tries to find the best value for the velocity of light on the basis 
of several measurements which have yielded slightly different values. An 
estimate we make cannot be asserted with certainty. Strictly speaking, it 
is a guess. That does not mean that it is necessarily an arbitrary guess, 
that “any guess is as good as any other”. Sometimes it is a “good guess”, 
that is to say, the estimate is made by a careful procedure; but even for 
the most careful estimation there is no guarantee of success. To make a 
careful estimate means to utilize all relevant knowledge available and to 
reason well in deriving the estimate from this knowledge. Since the pro- 
cedure of estimation cannot lead to certainty, it is not a deductive but an 
inductive procedure. In an ordinary inductive procedure we reason from 
given knowledge, the evidence e, to an unknown event, expressed in the 
hypothesis /; for instance, from our knowledge about the present weather 
situation and earlier meteorological observations to the prediction that it 
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will rain tomorrow. The procedure of estimation is similar to this ordinary 
inductive procedure, but with this difference: the question is no longer 
how probable it is that it will rain tomorrow but, rather, kow-much will it 
presumably rain tomorrow. 

Since estimation is an inductive procedure, it is the task of inductive 
logic to provide a method for it. We regard the concept of degree of con- 
firmation as the fundamental concept of inductive logic. Accordingly, we 
shall show how the concept of an estimate can be based on that of degree 
of confirmation. Although scientists make estimations all the time and 
elaborate methods of estimation have been developed in mathematical 
statistics, there is at present no agreement as to the nature and definition 
of the concept of estimate and as to its relation to the concept of prob- 
ability, or degree of confirmation. As was mentioned earlier, many scien- 
tists are skeptical with respect to the possibility of constructing an ade- 
quate quantitative explicatum for probability,. Therefore, some statisti- 
cians have developed independent methods of estimation, that is to say, 
methods not based on probability,. However, the present situation in the 
theory of estimation as dealt with in treatises on probability and sta- 
tistics gives a startling spectacle of unsolved controversies and mutual 
misunderstandings, all the more disturbing when we compare it with the 
exactness, clarity, and possibility of coming to a general agreement in 
other fields of mathematics. A typical example of this situation is the 
problem, famous and much discussed since classical times, of estimating 
simultaneously the mean and the variance (mean square deviation, square 
of the standard deviation) of the distribution of a magnitude in the whole 
population on the basis of the mean and the variance found in an observed 
sample of n individuals..It was originally customary to take in this case 
as estimate for the variance in the population simply the variance found 
in the sample. Then the mathematician Carl Friedrich Gauss (in 1823) and 
the astronomer Bessel suggested a modified estimate containing the cor- 
rective factor m/(m — 1). [Concerning the historical origin of this correc- 
tion see Wolfenden [Statistics], p. 164.] It seems that the majority of sta- 
tisticians since that time have regarded the modified value as a more 
adequate estimate. However, this is not a matter of refutation of one re- 
sult and proof of another result but rather a matter of plausibility. This 
is shown by the fact that even today some statisticians, and among them 
prominent men in the field, still regard in certain cases the original value 
of the estimate as adequate. [For instance, R. A. Fisher’s method of maxi- 
mum likelihood yields the original value; see Wolfenden [Statistics], pp. 
39 f.; Cramér [Statistics], p- 504.] Some statisticians declare frankly that 
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any assertion of the validity of either value would be purely dogmatic and 
that this and similar controversial questions of estimation are ultimately 
matters of taste. 

R. A. Fisher has carried out a systematic investigation of methods of 
estimation, which led to many fruitful results. He examined estimate-func- 
tions with respect to their “efficiency” and other properties. In spite of 
the great advance which the theory of estimation has made in recent 
decades due to the work of Fisher and other statisticians, it is under- 
standable that many statisticians even today regard the situation in this 
field as rather unsatisfactory. This is not merely due to the fact that any 
procedure of estimation depends upon a choice, which is a matter of prac- 
tical decision and not uniquely determined by purely theoretical, logico- 
mathematical considerations. There are many points in the procedure of 
science which involve a choice; for instance, the choice of a system of ge- 
ometry as a theory of physical space or, alternatively, the choice of an 
operational definition for equality of distances. Also for the method which 
we shall propose here, which bases the concept of estimate upon that of 
degree of confirmation, there remains still the necessity of a choice, viz., 
the choice of a concept of degree of confirmation as an adequate explica- 
tum for probability,. But the advantage of this method is this: only one 
fundamental decision is required. As soon as anybody makes this de- 
cision, that is to say, chooses a concept of degree of confirmation which 
seems to him adequate, then he is in the possession of a general method 
which makes it possible to deal with all the various problems of inductive 
logic in a coherent and systematic way, including the problems of esti- 
mation. Thus this method helps to overcome what seems to me the great- 
est weakness in the contemporary statistical theory of estimation, namely, 
the lack of a general method. There is in general (with the exception of 
Fisher’s method, see below) no unique set of rules, say, in the form of a 
postulate system, let alone an explicit definition for the concept of esti- 
mate. Instead, for every new problem of estimation, new considerations 
of plausibility are made which may lead to the Tan of a particular pro- 
cedure for that particular problem. Consequently, any proposed solution 
of a problem of estimation is more or less isolated and often not well in 
accord with accepted solutions of other problems. For instance, nobody 
can say today whether Gauss’s solution of the problem of the estimation 
oi variance and mean may be regarded as definitive or whether some- 
body might not come tomorrow with similar plausibility considerations 
and show us that it seems advisable to take still another value as the 
estimate. 
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While it is true that the work of most of the contemporary statisticians 
in problems of estimation is not based upon a general method, there is one 
partial exception to this statement. R. A. Fisher’s method of maximum 
likelihood is a general method at least for a large class of problems of 
estimation, including those most frequently dealt with in modern statis- 
tics. This method applies to those cases in which the estimation concerns 
the value of a parameter in the distribution of some magnitudes in the 
whole population and the evidence describes the distribution of the same 
magnitudes in a given sample. The maximum likelihood estimate is that 
value among the possible values of the parameter for which the likelihood 
(that is, the probability on the basis of a given parameter value) of the 
observed sample is a maximum. (‘Probability’ is to be understood here in 
the sense of probability.; it is equal to the probability, for the descrip- 
tion of the given sample as hypothesis and the statement of any value of 
the parameter as evidence.) The validity of this method is today still 
controversial. It seems that the majority of statisticians are not willing to 
accept it as a general method (which would, for instance, imply the rejec- 
tion of Gauss’s correction in some cases, see above) although they apply it 
frequently in certain kinds of problems. In a later chapter (in Vol. II) we 
shall discuss in greater detail that part of the method of maximum likeli- 
hood which applies to those problems of estimation which occur in the 
limited domain of our system of inductive logic, especially problems of 
estimation of relative frequency. It will then be shown that there are 
serious reasons for doubting the adequacy of the values to which this 
method leads in many cases on the basis of small samples. But even if 
the method cannot be regarded as generally valid, the estimates which 
result from it are in many cases practically acceptable as approximations, 
especially when the sample which serves as basis is sufficiently large; and 
in these cases the use of the method is often very convenient. 

An incidental remark apropos the concept of an approximate estimate 
may be made here. It is sometimes said that all estimates are approxi- 
mations. This formulation seems to me misleading. What is presumably 
meant is that we have no certainty that our estimate of a magnitude 
is equal to or even close to its true value; and this is, of course, correct. 
But it would be better not to use the term ‘approximation’ for this 
fact. The term in its original and generally accepted sense is needed 
in order to distinguish between an exact estimate and an approxi- 
mate estimate. When somebody has chosen a method of estimation, 
he may use it in a given case either exactly or with some convenient 
simplification in order to find in a shorter time a somewhat deviating 
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value. Suppose, for example, that a rule of estimation which he has ac- 
cepted says that in certain cases the arithmetic mean of the observed 
values of a magnitude is to be taken as the estimate. Suppose further that 
in a given case of this kind the observed values are 79.852 and 82.176. 
Then he might, if he is in a hurry or lazy, reason as follows: the first value 
is somewhat less than 80, the second is somewhat more than 82; hence 
let’s take 81 as an estimate good enough for the practical purposes at 
hand. In this case, 81 is an approximate estimate; 81.014 is the exact 
estimate. This means merely that 81.014 is exactly that value to which 
the method leads on the basis of the two given values. It does not, of 
course, mean that this is the true value. 

The most influential school in contemporary mathematical statistics 
and especially in the theory of statistical inference and estimation, besides 
that of Fisher, is the school founded by J. Neyman and E. S. Pearson. 
These authors do not, like Fisher, base their work on one particular meth- 
od. Instead they make very fruitful and interesting general investigations 
concerning methods of testing statistical hypotheses. A method of de- 
termining an estimate wu’ for a parameter in the population on the basis 
of an observed sample as evidence is a special case of this kind; in this 
case the hypothesis asserts that the actual value v of the parameter is u’. 
Since these authors, like Fisher, do not believe in the possibility of an in- 
verse inductive inference based on a quantitative explicatum for proba- 
bility;, they regard it as the task of an observer X merely to decide 
whether he should reject or accept the hypothesis in question, but not to 
determine its degree of confirmation. X’s decision is based upon the 
evidence concerning the sample he has observed. A test method for the 
determination of this decision may be given by characterizing a class of 
possible samples (called ‘the critical region in the sample space’); if the 
observed sample belongs to this class, the hypothesis is to be rejected. 
The resulting decision may be wrong in either of two different ways: (1) 
X may reject the hypothesis although it is true; in the example of the 
estimate: X assumes that u # wu’ although in fact u = u'; this is called 
an error of type I; or (2) X may accept the hypothesis although it is 
false; in the example: X assumes that u = w’ although in fact u ¥ u’; 
this is called an error of type II. The aim is to control the errors of the 
first kind on a fixed level of probability, and then to diminish the prob- 
ability, of occurrences of errors of the second kind as much as possible. 

Another method developed by Neyman is the method of making an 
interval estimate for the unknown value w of a parameter characterizing 
the population on the basis of a given sample. For a fixed value r, Say 0.99, 
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called the confidence, coefficient, the method determines an interval 
(uu), called the confidence interval. Then the estimate-assumption is 
made that the actual value w of the parameter lies within the interval 
(uyu). The theory states that the probability for obtaining correct state- 
ments of this kind in a series of repeated drawings of samples of equal size 
from the same population is z. It is to be noted that ‘probability’ is here 
meant in the sense of probability,; the term ‘confidence coefficient’ must 
not be understood in the sense of probability, or degree of confirmation. 
Let e be the description of the observed sample and /i, the statement that 
the actual value x lies within the confidence interval determined on the 
basis of e. The above statement of the probability value r cannot be in- 
terpreted as saying that c(/,,e) = r; any question concerning the prob- 
ability of a parameter value in the population with respect to given evi- 
dence concerning an observed sample is regarded as meaningless in this 
theory, as generally in contemporary mathematical statistics. This follows 
from the clear explanations given by Neyman himself ({Outline], pp. 
347 fi.) as well as from those by other authors (sce Wald [Principles], 
pp. 25 ff., Wilks [Statistics], pp. 122 ff., Cramér [Statistics], pp. 512 ff.). 
The probability,-value r, like any probability.-value, is indeed equal to 
a value of probability, or c. This holds, however, only for the c in a direct 
inference (as mentioned in § 94), not in an inverse inference. Thus, if j is 
the assumption that the actual value of the parameter is x, then, for any e 
describing any sample, the c of ke with respect to j, not to e, equals r. 

The ideas of Neyman and Pearson have been further developed by 
A. Wald ({Contributions], [Principles], [Risk]). He investigates decision 
functions (e.g., estimate-functions which determine an estimate w’ of the 
value of a parameter w in the population as a function of an observed 
sample as evidence) from the point of view of the risk functions associated 
with them (the risk function states the expectation value of the loss suf- 
fered by X if he chooses a particular estimate-function, as a function of 
the actual value u of the parameter). Wald ({Risk]) considers, in particu- 
lar, the so-called minimax-rule for the choice of a decision function, e.g., 
an estimate function: X determines for every decision function the risk 
as a function of u, and then the maximum risk for varying u; then the 
rule tells him to choose that decision function for which this maximum 
risk is a minimum. For any given observed sample as evidence, the chosen 
decision function will then determine the decision to be taken. This rule 
leads in general to decisions different from those determined by our rules 
R, (§ 50E) and R, (§ 51A). Our rules prescribe to minimize, not the maxi- 
mum risk, as Wald’s rule does, but the probability,-weighted risk (R, con- 
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siders the risk with respect to the monetary gain, R; with respect to the 
utility). The probability weighted risk will be explicated in this chapter 
as the c-weighted mean of the loss. We shall later discuss Wald’s rule and 
compare it with our rule (in Vol. II). 

Why did statisticians spend so much effort in developing independent 
methods of estimation, i.e., methods not based on a concept of proba- 
bility,? One gets the impression that the strongest motive was not a posi- 
tive one, say, the attraction of convincing and fruitful methods of a new 
kind. It seems clear that even for Gauss and still more for contemporary 
statisticians the main reason was purely negative; it was the dissatisfac- 
tion with the classical approach, in particular with the principle of indif- 
ference (or insufficient reason). This principle enters the problems of es- 
timation for parameters of the population by way of the Bayes-Laplace 
theorem. Since this principle leads sometimes to quite absurd results and 
in its strongest form even to contradictions, it must indeed be rejected. 
The classical theory of probability was essentially based on this prin- 
ciple, and there was no other general theory of probability, avoiding this 
principle. Therefore, it is psychologically understandable that statisticians 
believed themselves compelled to look for independent methods of estima- 
tion as the only solution. By developing these methods, modern statisti- 
cians made important progress in comparison with the uncritical methods 
which were still in almost general use throughout the nineteenth century. 
Thus, it is due to their work that the statistical practice today is sounder 
and leads to more reliable values of estimates and to a more efficient de- 
sign of experiments than was previously the case. 

However, today it is necessary to re-examine the question of the neces- 
sity of independent methods. If we have to come to the conclusion that 
there is no adequate quantitative explicatum for probability,, then the 
methods developed by Fisher, Neyman, Pearson, and Wald or new meth- 
ods of a similar nature are presumably the best instruments for estimating 
parameter values and testing hypotheses, They are ingenious devices for 
achieving these ends without making use of any general explicatum for 
probability, as far as the ends can be achieved under this restricting con- 
dition. Jf, on the other hand, we should find it possible to define a concept 
of degree of confirmation which does not lead to the inacceptable conse- 
quences of the principle of indifference, then the main reason for develop- 
ing independent methods of estimation and testing would vanish. Then 
it would seem more natural to take the degree of confirmation as the basic 
concept for all of inductive statistics. This would lead to simpler and more 


effective procedures. In testing a given hypothesis concerning the popula- 
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tion or an unobserved sample, we could take its degree of confirmation 
with respect to the observed sample as a measure of its acceptability. All 
problems of estimation could then be answered by one general estimate- 
function to be defined on the basis of degree of confirmation. The latter 
definition can easily be constructed in analogy to customary conceptions, 
as we shall soon see. 

The fundamental problem of the possibility of an adequate explicatum 
for probability, is at present still an open question. Only the first steps 
toward an affirmative answer have been made: in this volume the theory 
of regular and symmetrical c-functions has been developed, and the theory 
of the function c* will be given in the second volume. However, these parts 
of inductive logic apply only to our simple language £. It remains a task 
for the future to extend the theory to more comprehensive languages, 
first to a co-ordinate language which makes it possible to take into ac- 
count the temporal order of events (see the explanations in § 15B), and 
finally to the full quantitative language of physics. It is true that these 
extensions, especially the latter one, involve serious difficulties which 
certainly should not be underestimated. But it seems to me that the situa- 
tion, as we see it today, gives no reason for regarding the difficulties as 
insuperable, Some ways which might lead to an adequate extended theory 
will be discussed later (in Vol. II). 

In the foregoing discussion we have looked at methods of estimation 
from the point of view of their logical form, distinguishing between those 
based on degree of confirmation and the independent ones. However, the 
goodness of a method is to be judged primarily, not by its form, but by its 
results. Therefore, a method of estimation must be judged primarily, not 
by simply asking whether or not it is independent, but rather by examin- 
ing the estimates to which it leads. Later (in Vol. II) we shall compare 
the function c*, which is our explicatum for probability, (see § 110A), 
with other explicata with respect to the adequacy of the values of degree 
of confirmation for given cases of a hypothesis 4 and an evidence e. Then 
we shall also make comparisons of various methods of estimation. We 
shall see in the next section how any concept of degree of confirmation 
leads to a general estimate-function based upon it. Therefore, we shall 
later compare the method of estimation based on the various concepts 
of confirmation, c* and others, but also independent methods of estima- 
tion proposed or discussed in modern statistics. The comparison will have 
the purpose of judging the goodness of the various methods irrespective 
of their logical form. Therefore, we shall have to judge the adequacy of 
the values of estimates furnished by the various methods for given cases. 
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The comparison will chiefly concern estimates of the relative frequency of 
a property in the whole population on the basis of an observed sample, 
because the problem of this kind of estimate belongs to the most impor- 
tant problems of estimation, and it can be dealt with in our system of in- 
ductive logic, as we shall soon see (§§ 104 ff.). In certain cases all methods 
under consideration supply equal estimates (for instance, in the example 
of the lottery mentioned in the next section); in others the estimates will 
be close together. But there will also be cases where some of the methods 
lead to very different estimates. In some cases a discussion of these dif- 
ferent values will show that one of them is adequate and the other in- 
adequate, or at least that one is more adequate than the other. In this 
way we shall try to judge the adequacy of the methods. 


§ 99. A General Estimate-Function 


Suppose that, on the basis of given evidence e, several values of a certain 
magnitude are possible with various degrees of confirmation. Then it seems nat- 
ural to take as the estimate of the magnitude the weighted mean of the possible 
values with the degrees of confirmation as weights. This is called the c-mean 
estimate. 


We shall now study the question as to how a general method of esti- 
mation can be defined in terms of degree of confirmation. We imagine a 
scientist X who is in possession of a concept of degree of confirmation, say 
¢, which he regards as adequate, in application to the sentences of his 
scientific language. We leave aside the question how.the concept c is de- 
fined and for what reasons X has chosen this particular concept. We are 
only interested in the problem how he can use the chosen concept ¢ in 
order to construct a method of estimation, a general method that will 
supply estimates for all kinds of magnitudes expressible in his language. 
The general discussion in this and the following sections will not be re- 
stricted to our language systems £ but will refer to any language contain- 
ing quantitative concepts. This may, for instance, be a language of phys- 
ics containing numerical functions like length, mass, temperature, etc., or 
a language of economics containing, in addition to physical concepts, 
economic quantitative concepts like price, wage, demand, etc. We sup- 
pose that the concept c is applicable to the sentences of this language. 
Since the c-functions of our inductive logic apply only to the simple lan- 
guages £ and no adequate concept of degree of confirmation for more com- 
prehensive languages of the kinds just mentioned have been constructed 
so far, the assumption of the concept c is made in anticipation of the future 
development of inductive logic. 
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Suppose that X wants to make an estimate of the unknown value of 
the function f for the argument ų (e.g., the temperature at the space-time 
point ų, or the cardinal number of the class w of the hydrogen molecules in 
a given vessel). Although X does not know the value f(u), he knows other 
data related to it. Let e be the evidence available to him on the basis of 
which he attempts to find an estimate of f(u). Let us assume that there is 
only a finite number of possible values for f(u), say 71, fa... , n and 
that X is aware of this fact on the basis either of the definitions of f and u, 
or of the evidence e. This assumption serves to simplify the following dis- 
cussion; the result can later be extended to the more general case. Let hp 


(p = 1,...,) be the hypothesis that f(u) = r,. Then the assumption 
mentioned means that h, V ka V . . . V kn is either L-true or at least L-im- 
plied by e: 

(x) ted hi V... Vin. 


[For example, X wants to make an estimation as to how many of a hun- 
dred given objects have the property M. Here n = tor; the possible 
values are the cardinal numbers o, 1, . . . , 100.] We presuppose further 
that f (either by definition or on the basis of e) is a univalued function, 
that is to say, only one of the values can occur; hence 


(2) hy, ..., hn are L-exclusive in pairs (D20-2d) with respect to e . 


How should X proceed in order to obtain an estimate on the basis of the 
possible values r;,..., 7? He might perhaps consider taking simply an 
average of these values, for instance, the arithmetic mean or the median. 
However, this procedure would seem very crude and unsatisfactory, be- 
cause it does not utilize all the relevant information contained in e. If we 
see from e that there is more reason to expect somè of the possible values 
than the others, then it would be wrong to treat all values alike. It would 
seem more appropriate to take a weighted average than a simple average. 
The concept of a weighted mean is generally defined in this way: if 
weights w, are assigned to the values 7,, the weighted mean is 


(3) SD. [rp X w,)/>> Wp. 


What should we take as weights of the values r, for the purpose of an 
estimation? It seems natural to give to a value r, the more weight the 
more probable its occurrence is. Hence we shall take as weight of r, 
c(pe); the resulting c-weighted mean will be called, for short, the c-mean. 
Let D be the denominator in (3); that is, 


(4) DS SD c(Ap,e) « 
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Therefore (from (2), with Ts59-1m): 


(5) D= c(h. V...Vhn,e) . 
Hence (from (x), with T59-1b): 
(6) D=1. 


Thus the denominator in (3) can be omitted. Hence the c-mean is equal 
to the numerator in (3), that is, the sum of the possible values r,, each 
multiplied with the c for its occurrence: 


(7) = [ro X c(hp,e)] - 


This we shall.use for our definition of the c-mean estimate-function in the 
next section (Droo-1). 

Example. Suppose that X has a ticket in a lottery and wants to make an 
estimate of his gain. His evidence e contains the following facts: for one 
hundred tickets there is one prize of $10 and fifteen prizes of $2 each; 
eighty-four tickets will not win anything; the ordinary procedure is ap- 
plied so that every ticket has an equal chance for each of the prizes. Thus, 
the possible values are: 7, = 10, 74 = 2, 7; = o. Suppose that the chosen 
c-function c, as seems plausible, is such that its value for r, is 0.01, for 
fa 0.15, for r, 0.84. Then the c-mean, according to (7), is 10 X 0.01 + 
2 X 0.15 +0 X 0.84 = 0.4. What is the meaning of accepting this value 
as the estimate? It does not mean that a gain of 0.4 is the most probable 
outcome. The most probable outcome is rather o, because c is highest for 
this value. And 0.4 is not even a possible outcome; the possible values are 
only ro, 2, and o: The estimate 0.4 has been determined as the probability- 
weighted mean. It means that for X, on the basis of the chosen ¢ and in 
view of the available evidence e, 0.4 is the reasonable valuation for his 
ticket. If X is motivated only by a sober examination of his chances and 
not by the gambler’s urge for excitement, he will not buy a ticket in this 
lottery for more than 4o cents or sell it for less. 

We shall use the symbol ‘e’ for the c-mean taken as an estimate-func- 
tion. We shall write ‘e(f,u,e)’ as abbreviation for ‘the c-mean estimate of 
the value of the function f for the argument u with respect to the evi- 
at e’. Thus ‘e’ is a functor in the semantical metalanguage, like ‘m’, 

”, etc., not a symbol of the object language, which may here be the ee 
ae of physics. For example, if ‘temp(a)’ means ‘the temperature at the 
space-time point a’, then we write ‘e(temp,a,e)’ for ‘the c-mean estimate 
of the temperature at a on the evidence e’. 
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Note that we must write here ‘temp’ and ‘a’ as two separate argument ex- 
pressions, not ‘temp(a)’ as one. For the c-mean depends upon (i.e., is a func- 
tion of) the function f (here, temperature) and the argument (here, the point 
a) and not simply of the number f(u) (here, temp(a)). Note further that the 
simple use in ‘e(temp,a,e) of the symbols ‘temp’ and ‘a’ of the object language 
instead of their names is a simplification of the notation permitted by Conven- 
tion 14-2, case (b). 


Several methods are known in statistics for defining a kind of average 
or central tendency for a given frequency distribution (called probability 
distribution, in the sense of probability.). The three most customary 
concepts of this kind are the mean, the mode, and the median. These con- 
cepts can be transferred to a c-distribution; the resulting concepts of in- 
ductive logic, for which we may use the terms ‘c-mean’, ‘c-mode’, and 
‘median’, have the same analogy to the three statistical concepts as 
probability, has to probability.; the general character of this analogy will 
be discussed later (§ 100B). The c-mean has been defined above. A c-mode 
is any of the possible values for which c has its maximum. If r is such that 
it is just as probable that the actual value is below r as that it is above r, 
then r is a c-median (more generally speaking, r is a c-median if either (1) 
c = 1/2 for the hypothesis that the actual value f(u) < r, or (2) ¢ < 1/2 
for the assumption that f(u) < r and ¢ > 1/2 for the assumption that 
f(u) <r). The question might be raised why just the c-mean should 
be chosen as the estimate-function rather than the c-mode or the c-me- 
dian. Theoretically either of the latter two or any other definition of a 
central value might be used as an estimate-function. However, the esti- 
mates are here meant to serve as guides for practical decisions in the sense 
discussed earlier: X is advised to act in certain respects as if he knew that 
the unknown value were equal or near to the estimate for it (R;, § 50D), 
or it is recommended to him to maximize the estimate of his gain (Ry, 
§ 50E) or its utility (Rs, § 51A). With regard to this purpose of estimation, 
it seems doubtful whether the c-mode or the c-median could be regarded 
as adequate estimate-functions. In the above example of a lottery, the 
c-mode of X’s gain is o, because its c is the maximum; the same holds for 
most of the actual commercial lotteries. It is obvious that this value o is 
unsuitable as a basis for X’s decision (see the earlier discussion of rule R, 
in § 50C). In the same example, the c-median is likewise ô; this holds for 
most lotteries, namely, whenever it is more probable to win nothing than 
to win something. To take another example, suppose that there is an odd 
number 27 — 1 of possible values, all of which have the same c; then the 
c-median is the mth value in the order of increasing magnitude. Thus, if the 
equiprobable values are 3, 4, and 5, the c-median is 4. This seems plausible; 
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4 is also the c-mean. But for the values 3, 4, and 500, the c-median is like- 
wise 4, while the c-mean is 507/3 = 169. (There is no c-mode in these two 
cases.) Generally speaking, the c-median, in contradistinction to the 
¢-mean, is insensitive to certain changes in the situation (namely, to any 
change, however large, in a possible value provided only that it remains 
on the same side of the original c-median). On the other hand, the decision 
of X should be influenced by such changes. For example, if the equiprob- 
able gains in a game are 3, 4, 500, then X should be willing to pay more 
for the right of participation than if they were 3, 4, 5. Thus the c-mean 
seems the most suitable among the customary concepts of a central value 
to be chosen as an estimate function. 

It is important to distinguish clearly between an estimate-function and 
the estimates which it supplies in given cases, that is, the values of the 
estimate-function for given arguments. Instead of the term ‘estimate-func- 
tion’ the term ‘estimator’ is sometimes used (Kendall [Statistics], II, 2, 
following E. J. G. Pitman). An estimate-function may be defined in such 
a way that it is applicable only to one particular kind of magnitude, say, 
the relative frequency of a property in a class or the mean or the variance 
of a magnitude within a class; in this case we call it a special estimate-func- 
tion. (An example is the straight rule for estimating relative frequency, 
which will be mentioned later.) A general estimate-function, on the other 
hand, is one applicable to different kinds of magnitude, possibly to all 
magnitudes expressible in the given object language. Fisher’s maximum 
likelihood estimate-function mentioned in the preceding section is general 
in this sense; and the c-mean estimate-function e, which was discussed in 
this section and will be defined in the next one, is general to a still greater 
extent. 


§ 100. Definition of the c-Mean Estimate-Function 


A. The Definition. If any regular c-function ¢ is given, the c-mean estimate- 
function eis defined as follows, Theestimate eof a magnitude with respect to given 
evidence e is the sum of the possible values of the magnitude, each multiplied 
by the degree of confirmation ¢ for its occurrence on evidence e. If the magni- 
tude has a continuous scale of values, an integral takes the place of the sum. 
From now on the term ‘estimate’ is to be understood in the sense of ‘c-mean 
estimate’ as just defined. 

Some simple theorems concerning estimation are given, among them the fol- 
lowing (Ts): if the function f is defined as a certain linear function of Si, fay 
etc., then the estimate of f is the same linear function of the estimates of 
Ju fa, ete, 

B. Terminological Remarks. Our concept of c-mean estimate is essentially the 
same as the concept of mathematical expectation in the classical sense of this 
term. In modern statistics, however, the term ‘mathematical expectation’ has 
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changed from the earlier inductive, logical sense to a statistical, empirical 
sense; this change is analogous to, and caused by, the change in the meaning 
of the word ‘probability’ from probability: to probability. 

C. The Paradox of Estimation. It is found that in general the estimate of f? is 
different from the square of the estimate of f, and similarly with other nonlinear 
functions. Each of these two values seems to be a good basis for a rational ex- 
pectation concerning f? and a practical decision based on this expectation. 
However, since the two values are different, they seem to lead to incompatible 
expectations and decisions. This paradox is solved with the help of earlier con- 
siderations (§§ 50, 51): the decision is ultimately determined by the estimate of 
only one magnitude, the utility. 


A. Definition of the c-Mean Estimate-Function 


Our discussion in the preceding section suggests the following definition 
Dr. This definition does not introduce one estimate-function but rather 
a general form which, if any regular c-function is chosen, determines one 
general estimate-function based upon it (‘general’ in the sense of ‘appli- 
cable to any function f’). As earlier on every m-function a c-function was 
based (Ds55-3), thus here on every c-function an e-function is based. Later 
(in Vol. II), after introducing our particular c-function c*, we shall deal 
with the e-function e* based upon it; in the present chapter, however, we 
deal with e-functions in general. The object language is not specified in the 
definition; it need not necessarily be one of our systems £ but may be any 
system, provided the concept of regular c-functions is defined for that 
system in analogy to our earlier definition. 


+D100-1. (For any language system, not necessarily £.) e is the c- 
mean estimate-function (e-function) based upon the confirmation-function 
(c-function) ¢ =prif fis any function, wan argument of f, e any non-L-false 
sentence, 7:,..., 7%» the possible values of f(u) with respect to e, and 
hy, . . . , hn the hypotheses stating these values, such that the conditions 


(1) and (2) in § 99 are fulfilled, then 


e(f,u,e) = Do [ro X c(tp,e)] . 


What is here called the c-mean estimate of a magnitude is essentially 
the same as what often is called the mathematical expectation. (Compare, 
however, the terminological remarks below.) The term ‘estimate’ will be 
used in this chapter, unless otherwise indicated, in the sense of ‘c-mean 
estimate’, that is, in the sense of the function e defined by Dr. 

The following theorem Tr follows directly from Dr. (It is, so to speak, 
merely a reformulation of D1, stated for the convenience of later appli- 


cation.) 
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7100-1. (For any language system, not necessarily Q.) Let ¢ be a c- 
function, e be based upon c, f be any function, u an argument of f, e be any 
non-L-false sentence, r;,..., 7%, be (or include) the possible values of 
f(u) with respect to e, and h,, . . . , An be hypotheses stating these values, 
such that (1) }e D kV... V hn and (2) h, . . . , hn are L-exclusive in 
pairs with respect to e. Then 


e( fe) = X br» X e(hy,e)] « 


For the sake of simplicity we have restricted our discussion and the 
definition to cases in which the number of possible values of f(u) is finite. 
This condition is always fulfilled for the estimate of the absolute or rela- 
tive frequency of a property within a given finite class, and it is usually 
fulfilled for the estimate of the gain in a game of chance (see the example 
of a lottery in the preceding section). However, in estimations in physics 
there is usually a continuum of possible values, for instance, the totality of 
real numbers or an interval in it. In cases of this kind the probability 
distribution for the infinitely many possible values cannot be described 
simply by c; because the c for any single value is in general o. Other meth- 
ods must instead be used. It is clear that both the determination of degree 
of confirmation and that of estimates for magnitudes with a continuous 
scale of values require a more complex system of inductive logic. The de- 
velopment of such a system remains a task for the future. 

In our subsequent discussions we shall mostly deal with cases in which the 
number of possible values is finite. A few remarks may here be made concerning 
methods for cases with a continuum of possible values. 

x. In many cases of this kind, though not in all, a c-density function corre- 
sponding to c can be used, say e’(r,e). It may be defined for all real numbers 7, 
but its value is o for those numbers which are not possible values of f(u) on e. 


The connection between c and c is then as follows. If, for any values 7, and 72, 
hais the hypothesis that f(u) lies in the interval between 7; and r+, then 


(hia) = te *“c(r,e) dr. 


In the definition of the estimate, we have now an integral instead of the sum 


mentioned in Dr: 
e(f,u,e) = if re'(r,e) dr. 


Let K be the set of all values 7 of f(u) which are possible on e; usually K is an 
interval of real numbers, but it may be any other (integrable) subset. Then it is 
sufficient to extend the integral just mentioned over K, because outside of K 
=o. 

2. A more general method consists in using a cumulative c-distribution func- 
tion (analogous to a cumulative frequency function in statistics), say ce, de- 
fined for all real numbers z. Its meaning is as follows. Let k, be the hypothesis 
that f(u) < r. Then c(hr,e) = ce(r,e). Thus ce(r,e) is the degree of confirmation 
on e for the assumption that the unknown value of the magnitude in question 
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does not exceed r. (If ce is differentiable throughout, its derivative dec(r,e)/dr 
defines the density function ¢’(r,e). Thus in this case method (1) is applicable. 
If œ is taken as primitive, c-(7,e) can be defined as its integral from — to r.) 
Suppose that the function ce(r,e) is given, and that Ar: is the hypothesis that 
fı < f(u) S rz (in other words, that the unknown value in question lies within 
the interval (r;,72) closed at the right end). Then c(dz2,e) = ce(rae) — tc(re). 
The estimate function is, in this method, to be defined by a so-called Stieltjes 
integral (see, e.g., Cramér [Statistics], chap. 7): 


e(fju,je) = ipe r de 
(Cramér, pp. 170 f.). 

We shall now state some elementary theorems concerning estimates, 
They are simple consequences from Tx and hence from Dr. In these theo- 
rems it is tacitly presupposed that e is any e-function based upon a regu- 
lar c-function, that f, f’, fx etc., are functions with numerical values, that u 
is an argument of these functions, and e is a non-L-false sentence. The 
proofs, like Dz, refer only to the case that the possible values are finite in 
number; therefore they use finite sums. The proofs for the more general 
case are analogous but use integrals instead of finite sums. 

T2 says that the estimate for the sum of two magnitudes is equal to the 
sum of the estimates of the two magnitudes. 

7100-2. e(f + fue) = e(f,u,e) + e(f’,m,e). 


Eron Let the possible values of f(u) be f1... ,”m, and those of J'u) 
Ti, . - - , fm’. Let the sentences stating these values be hı, . . - , Ams Rees Bat 


respectively. Then, according to Tr: (a) e(f,#,e) = È [rp X N (b) a " u,e) 
= Ds [rp X clhpne)l; (c) ef +fime) = SD = [tp + rp’) X clhp « hp'e)] 


Per p'e 


-È D box N+ E E Xe) Let this be q +g a = 
x lp x 3 e(..)]. Herein $ = (hp = h,e) +...+ chip = hh m’s€); hence, since 


the m” conjunctions are ANE in pairs (§ 99, condition (2)), according to 
Ts9-1m: 2 = d(hp-s) V...V (lp hm’) e] = clp» (a V -N hm’) €] = clhpe) 
($9901), Tso-2h). Therefore q: = >D [rp X c(hpe)] = e(f,u,e), (a). It is found 
analogously that g2 = e(f’,#,¢). Hence the assertion. / 
The following theorem says that, for any fixed number q, the estimate 
of q times f is equal to q times the estimate of f. 
T100-3. For any real number q, e(g X f,u,e) = g X e(f,2,€). 
Proof. Let rp and hy (p = 1, -.-, m) be as in Tr. Then (T1): elg X fume) = 
È la X ro X etn) = a X 2i To X olin) = g X ene). 
The ‘following theorem refers analogously to the addition of a fixed 
number g. 


s 


$, 
* 
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T100-4. For any real number q, e(f + 9,u,e) = e(f,w,e) + 9. (Proof 
analogous to T3.) 


The following is the most important of these theorems. It says that the 
estimate for any linear function of the values of given functions is equal to 
the same linear function of the estimates for the given functions. 


+T100-5. Let f be defined in terms of given functions f,, . . . fn as 
follows: f(u) = go + gq: X f:(u) +... +gn X falu), where go, ..., qn are 
any fixed real numbers. Then e(f,u,e) = go + q: X elfuu,e) +... + In 
X e(fn,u,e). (From T3, T2, T4.) 


B. Some Terminological Remarks 


The nature of the concepts defined or indicated in this section may per- 
haps be clarified by comparing and contrasting them with certain con- 
cepts in modern mathematical statistics which are analogous to our con- 
cepts but different from them. The fundamental difference consists in the 
fact that our concepts are based upon probability, and hence are concepts 
of inductive logic, while the corresponding concepts in statistics are based 
upon probability,, that is, relative frequency in the whole population, and 
hence are empirically determined magnitudes. We have earlier (§ 9) men- 
tioned the fact that there are two contemporary schools who use the word 
‘probability’ in the sense of ‘probability,’ : (1) the probability theories of 
Mises and Reichenbach, who define ‘probability’ as ‘limit of relative fre- 
quency in an infinite sequence’, and (2) modern mathematical statistics, 
where ‘probability’ is likewise understood as ‘relative frequency in an in- 
finite population’, but is used as an undefined term not involving the con- 
cept of limit. We have further discussed (§ 42A) the historical fact that 
nearly all authors in these two schools seem to be unaware of the fact that 
their meaning of the term ‘probability’ (viz., probability.) is fundamental- 
ly different from the meaning which the same word has for the classical 
authors and their followers (viz., probability,). Consequently they have 
taken over from the earlier authors a number of definitions of other 
terms based on the term ‘probability’; and here again they seem not to 
realize that they merely copy the words of the old definitions but thereby 
assign to the defined terms meanings quite different from the original ones. 
This holds in the first place for the term ‘mathematical expectation’. This 
and similar terms (‘mathematical hope’, etc.) have been used in the 
classical theory of probability, especially with respect to games of chance, 
either for the product of a possible gain and its probability or for the sum 
of all these products for all possible cases (‘espérance totale’, ‘total expec- 


| 


* 
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tation’). Thus our concept of the estimate of a magnitude is essentially 
the same as the classical concept of mathematical expectation. It seems 
that the classical meaning of the term ‘mathematical expectation’ remained 
in use throughout the last century and likewise with those authors of our 
time who deal with probability, (e.g., Keynes [Probab.], pp. 311 ff., Jef- 
freys [Probab.], p. 42). However, when we come to the authors in modern 
statistics, we find that the meaning of the term changes radically. Terms 
like ‘mathematical expectation’, ‘expected value’, ‘mean value’ are defined 
in an apparently similar way as previously, that is, as a sum (or integral) 
of the products of the possible values and their probabilities (see, for in- 
stance, Wilks [Statistics], p. 29, Wolfenden [Statistics], p. 12, Cramér 
[Statistics], p. 170). Since, however, ‘probability’ means now probability., 
the concept defined is no longer a concept of inductive logic, dependent 
upon evidence, but rather a function whose values are determined by ob- 
servation. It is descriptive of certain facts irrespective of anybody’s 
knowledge about them. It is therefore without any inductive significance 
for presumption or expectation. Although the new concept may be inter- 
esting and fruitful and hence acceptable in its own right, both the terms 
‘mathematical’ and ‘expectation’ seem strange misnomers, especially the 
latter. The fact that the authors use the term ‘expectation’ for the new 
concept can be explained only by their erroneous belief that in adopting 
this term they follow the traditional usage. Let us use (only in the present 
discussion) the term ‘expectation,’ for the inductive concept based on 
probability,, and ‘expectation,’ for the statistical concept based on prob- 
ability,. In order to clarify the difference between these two concepts, let 
us go back to the example of the lottery in the preceding section. Here 
both the expectation, and the expectation, of X’s gain have the same nu- 
merical value 0.4; but nevertheless the meanings are different. The state- 
ment 

(1) ‘The expectation, of X’s gain with respect to e is 0.4 , 

where e is the evidence described earlier, is analytic. Suppose that e is false 
and that actually there are not one hundred but two hundred tickets. 
Then the statement mentioned is still true, and the value 0.4 of the ex- 
pectation, is still valid (with respect to the erroneous evidence e). On the 
other hand, the statement 

(2) ‘The expectation, of X’s gain is 0.4’ 

is of an entirely different nature. It is factual and empirical. It does not 
contain any reference to e, but it can be inferred from e (which is itself 
factual). Suppose again that e is false, that there are two hundred tickets 


530 IX. ESTIMATION 


but that the numbers and amounts of positive prizes are as stated in e. 
Then the statement (2) is false; the expectation, is not 0.4 but 0.2. If X’s 
beliefs based on his observations are formulated in e, then his presumption 
of gain is determined (if he is a rational man) by the value 0.4 of the ex- 
pectation,; it is not influenced by the expectation,. This is especially clear 
in the second case mentioned, where X’s belief is erroneous and the ex- 
pectation, is actually 0.2; since this value is, in this case, not known to X, 
it cannot influence his presumption. This shows that the term ‘expecta- 
tion’ when used in the statistical sense of ‘expectation,’ is a misnomer. 

The situation with respect to the concept of a c-density function (men- 
tioned above in the passage in small print) is similar, I might have taken 
the term ‘probability density function’ were it not for the fact that this 
term is used in modern statistics in the sense of probability, density. The 
latter concept is again an empirically determined function descriptive of 
an actual physical distribution of the values of a magnitude in an infinite 
population irrespective of any knowledge or evidence of an observer. 

The term ‘probable error’ has undergone a similar change in meaning. 
For the classical authors and those modern authors who deal with prob- 
ability, it means “the amount, which the difference between the actual 
value of the quantity and its most probable value is as likely as not to 
exceed” (Keynes [Probab.], p. 74); and we shall use it in a similar sense 
(§ 102). Contemporary statisticians, on the other hand, use the same term 
for the corresponding statistical concept; this is the amount 6 such that, 
in the actual distribution of the magnitude in question in the population, 
irrespective of whether anybody knows it or not, one half of the cases lie 
within the interval m + 6, where m is the mean value of the magnitude 
in the population. Keynes complains that statisticians often mix both 
uses of the term ‘probable error’ and generally slip somewhat easily from 
descriptive-statistical to inductive statements (of. cit., pp. 327 ff.). I think 
that statisticians today are more careful in their formulations in this point 
than the earlier authors whom Keynes criticized. But they avoid the 
previous ambiguity chiefly by restricting themselves in most cases to the 
use of statistical concepts, sometimes to the neglect of inductive concepts, 
which are equally important for their problems. 


C. The Paradox of Estimation 


We found (Ts) that, if f is defined in terms of f,, fa, etc., in the form of 
a linear function, then it makes no difference whether we determine the 
estimate for f directly or whether we first determine the estimates for 
fı, fa, etc., and then apply the linear function to them. It is important to 


§ 100. DEFINITION OF THE -MEAN ESTIMATE-FUNCTION 531 


notice that the same does not in general hold for a nonlinear function. 
This is shown by the following counterexample. 

Let the possible values of f(u) be 1, 2, 3. Let c have equal values, hence 1/3, 
for these three cases. Therefore, the estimate is 2, the mean of the three values. 
(e(f,u,e) = 1 X 1/3 + 2 X 1/3 +3 X 1/3 = 2.] Hence ¢ = 4. On the other 
hand, the possible values for f? are 1, 4, 9. ¢ is again equal for them; hence the 
estimate for /?(u) is their mean, that is, 14/3. [e({?,w,e) = 1 X 1/3 +4 X 1/3 + 
9 X 1/3 = 14/3.] This, however, is different from 4, the square of the esti- 
mate for f. 

As the example shows, in general the square of an estimate is not the same 
as the estimate of the square. Likewise, the estimate of the product f X g is 
in general not the same as the product of the estimates of f and of g. And 
analogously with other nonlinear functions. This fact raises a serious prob- 
lem for the application of the method of estimation. Suppose that the 
observer X has chosen a certain c-function c, which determines a certain 
e-function e. He possesses a certain amount of evidence e. He has to make 
practical decisions and he wants to base them on the estimates with re- 
spect to the evidence e. Suppose a particular decision depends upon what 
he presumes the value of f*(u) to be. Let us assume that the conditions of 
the above example hold. Then there are two possible ways for X to de- 
termine that value of f?(w) which he may rationally expect on the basis of 
his evidence e and therefore take as basis for his practical decision. (1) He 
finds that the estimate for f(u) on e is 2; therefore he decides to act as 
though he knew that f(u) is 2 and hence f(u) is 4. (2) As an alternative, 
he applies the procedure of estimation directly to f*; he finds that the 
estimate of f2(w) on eis 14/3; therefore he thinks he ought to act as though 
he knew that f#(u) is 14/3. However, these two decisions are incompatible. 
Two incompatible expectations for f’(u) are obtained on the basis of the 
same evidence e and with the help of the same estimation function e by 
two procedures which apparently are both correct. We propose to call this 
situation the paradox of estimation. 4 

If a system of inductive logic for languages containing quantitative 
magnitudes is to be constructed, which is not intended in this book, and 
if rules for the application of this inductive logic to given knowledge situa- 
tions are to be laid down, then this paradox must be eliminated. 

One way of achieving this would consist in choosing the c-median in- 
stead of the c-mean as the estimate-function. The square of the c-median 
of the possible values (if these are nonnegative) is the same as the c-median 
of their squares; in the above example, the c-median of the equiprobable 
values-r, 2, 3 is 2, and the c-median of their squares, 2, 4, 9, is 4. The same 
holds for any other monotonic increasing function. However, it seems to 
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me that this advantage of the c-median as an estimate-function is far out- 
weighed by its serious disadvantages, above all its insensitivity to certain 
practically relevant changes in the situation, as explained earlier (near 
the end of § 99). 

Tt seems to me that the paradox can be solved without abandoning the 
c-mean as the estimate-function. Let us first look at the situation from a 
purely theoretical point of view, leaving aside the problem of practical 
decisions. Then it must be said that each of the two estimation procedures 
is correct. Their results do not contradict each other because they are 
answers to two distinct questions. In the above example the two answers 


` are: 


‘The estimate of f is 2; hence the square of the estimate of f is 4’ 
and : 
‘The estimate of f? is 14/3’. 


These two statements appear as incompatible only if the estimates are 
interpreted, incorrectly, as the most reasonable expectations, However, 
an estimate must not be understood as a prediction but only as a weighted 
mean. This is especially clear in those cases (as in the example of the lot- 
tery in § 99) where the estimate is not a possible value. If the estimate 
statements are interpreted correctly, there is no contradiction and hence 
no paradox. 

The paradox appears as more serious when we turn from the theoreti- 
cal to the practical question. X asks for advice as to which decision he 
ought to take. He will not be satisfied if we tell him that, from one point 
of view, he should act as if he knew that f = 2, in other words, that f? = 4, 
but, from another point of view, he should act as if he knew that f= 
14/3. He can take only one decision. Although the two answers to the 
theoretical questions of estimation are not incompatible, the two sugges- 
tions for a practical decision are indeed incompatible. The solution is 
found with the help of our earlier analysis of rules for the application of 
inductive logic and, in particular, of estimates for the determination of 
practical decisions (§§ 50, 51). We saw that the customary rule: ‘Act as if 
you knew that the unknown value of the magnitude in question were equal 
or near to the estimate’ (Rule R;, § 50D) deserves its widespread accept- 
ance to a certain extent, because it is not only simple and convenient but 
also in many cases adequate, that is, leading to a reasonable decision. 
However, we found that it is not adequate in all cases. In order to come to 
generally adequate results, another rule must be applied which tells X to 
take that decision for which the estimate of his gain, expressed on a mone- 
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tary scale, is a maximum (Rule R,, § 50E); if the possible gains and losses 
are not small in relation to X’s present fortune, then a still further refined 
rule must be applied which prescribes the maximization not of the gain 
itself but of its utility, that is, the amount of satisfaction which X de- 
rives from the gain (Rule Rs, § 51A). If X follows either of the two last- 
mentioned rules, the paradox of estimation disappears because X will 
apply the procedure of estimation to the values of only one magnitude. 


§ 102. The Problem of the Reliability of an Estimate 


Some estimates are more reliable than others, that is to say, there is more 
reason for the expectation that the error of the estimate, that is, the difference 
between it and the actual value of the magnitude in question will turn out to 
be small. The problem is to explicate this concept of reliability. Several methods 
of explication are discussed. 

Suppose that the observer X has chosen a function ¢ and, based upon it, 
an estimate-function e. Suppose further that he has calculated the esti- 
mate e(f,u,e) for f(u) with respect to his evidence e. Then it will be of in- 
terest: to him to determine the precision or reliability of this estimate. It is 
clear that estimates may vary greatly in their reliability, that is, the prob- 
ability that the actual value of f(u) (in the terminology of statisticians, 
the “true value”) is close to the estimated value. For instance, we shall 
obviously have much more confidence in the estimate of the relative fre- 
quency of red-haired people among the inhabitants of Chicago based upon 
an observed sample of 10,000 persons than in an estimate based on a 
sample of only 100 persons. Thus we know practically, though inexactly, 
what we mean when we regard one estimate as more reliable than another. 
Our task is now to find an exact explicatum for the inexact concept of re- 
liability as an explicandum. There are several possible methods for an 
explication. We shall discuss three of them which are related to customary 
conceptions and which seem promising; we shall then adopt a form of the 
third method. In order to explain the three methods, let us take the fol- 
lowing example. On the basis of the description ¢ of a certain sample, X has 
found as estimate for the rf (relative frequency) of the property M in 
a given population the value e = 0.27. Let us suppose that the whole 
population is infinite, and rf is taken as the limit of the relative frequency 
with respect to a fixed serial order of the individuals (cf. § 95). In this 
case the possible values of rf form a continuous scale. If the population is 

‘finite but sufficiently large (e.g., the inhabitants of Chicago), the possible 
values of rf, although finite in number, lie so close to each other that 
dealing with them as if they formed a continuous scale is a good ap- 


proximation. 
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1. X considers an interval e + ô around his estimate e( fue) (for ex- 
ample, with ô = 0.02, the interval between 0.25 and 0.29). Then he 
determines the value of c, on his evidence e, for the hypothesis /; that rf 
lies within this interval: cs = c(hs,e). If he finds that ¢ is large, then he 
knows that it is very probable that the actual value of rf is close to the 
estimate e; hence cs measures the reliability of this estimate. However, 
X must clearly take into consideration not c alone but c in relation to 
the size 26 of the interval (in our case, 0.04). For if the interval size 28 is 
increased, cs will in general be increased too. Therefore X might consider 
taking the quotient c,/26. But since this again varies with ô, it seems more 
appropriate to take the limit toward which this quotient converges when 
smaller and smaller intervals are taken, provided this limit exists. Thus 
we may define the reliability of the estimate as 


lim (¢5/28) . 
5-0 


[This is the same as the confirmation density c for the value e(f,x,e) 
($ r00A).] 

2. The second method proceeds as follows. We have seen that for any 
interval e + ô around the estimate e there is a value c for the c of the 
hypothesis that the actual value f(u) lies in this interval. If the interval 
is very small, ¢ is small; if the interval covers all possible values of f(u), 
c is 1. For intermediate intervals, cs will have intermediate values. In- 
stead of choosing an interval and then determining its cs, as in the first 
method, X may choose once for all a fixed value between o and 1, and then 
determine the interval e + ô for which c has this value. Suppose he 
chooses ¢; = 1/2; then the corresponding interval e + ô has the charac- 
teristic property that it is, on the evidence e, just as probable for the actual 
value of f(u) to lie within this interval as without. Let us call the value 6 
defined in this way the probable error of the estimate e. The smaller the 
probable error, the more reliable is the estimate. The concept here defined 
is essentially the same as that for which the term ‘probable error’ was 
originally used in the theory of probability; in modern statistics, how- 
ever, the same term is used, not for this inductive concept but for an 
analogous statistical concept (see § tooB). [In many cases the confirma- 
tion density c’ is not symmetrical on both sides of the value e. In cases of 
this kind a better characterization of the reliability of the estimate is ' 
given by distinguishing between the lower probable error 5, and the up- 
per probable error ô.. They are defined in such a manner that the ¢ both 
for the interval between e — 6, and e and for that between e and e + 6, is 
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1/4; hence the c for the whole interval around e is here again 1/2, but e is 
here not necessarily in the middle of the interval.] 

3- The third method seems to be the most adequate of the three. The 
first two methods take into consideration only the c for those of the pos- 
sible values of f which lie in the neighborhood of the estimate e of f. 
Often c is rather low for the more remote values of f (especially if e de- 
scribes not a sample but the whole population, as in the direct inference, 
§§ 94-96). However, in other cases c may be relatively high even for 
remote values; and in these cases the third method may give a more ade- 
quate characterization of the nature of the estimate. Instead of asking: 
‘Is it very probable that the actual value of f is near to the estimate we 
have calculated, in other words, that the error we have made by our 
estimate is small?’ we shall now ask: ‘How small is this error presumably; 
in other words, what is the estimate for the error of our estimate for f?’ 
This method can be applied also if the number of possible values of f is 
finite and even very small. We assume in the following that it is a finite 
number #. Analogous definitions can be constructed for a continuous scale 
of values, on the basis of the methods indicated earlier (§ 100A). 

By the error of the estimate for f(u) we mean the difference between the 
estimated value and the actual value of f(u): 


D102-1. The error of the estimate e(f,u,e): 
 v(f,w,e) =n: e(f,u,e) — f(u) . 


Let 7;,..., fn, as previously (§ 99), be the possible values of f(u) on e, 
and fy, ..., An be the hypotheses stating these values. Suppose X has 
calculated the estimate of f(u) on his evidence e and has found the value 7’: 
(1) e(f,u,e) byy > Irn X c(hp,e)] =r. 


The actual value 7 of f(u) is either r, or r, or . . . or rn. If r = rp, then the 
error of X’s estimate 7’ is 


(2) vp =Di 7 — fp- 


This error is positive if the estimate 7’ is too high; it is negative if the esti- 
mate is too low. The actual error v of the estimate, that is, the difference 
between the estimated value 7’ and the actual value r of f(u), is, of course, 
unknown to X, since the actual value r of f(#) is not known. But just as 
he can determine the estimate 7’ for the unknown r, so he can apply the 
same general method of estimation in order to determine the estimate for b 
or for simple functions of v. We shall study: (A) the estimate for v itself; 
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(B) that for | v |, that is, the absolute value of v (irrespective of sign), 
and (C) the estimate for b?. 


A. We write ‘cp’ as short for ‘c(hz,e)’; this is the c for the case in which 
f(u) has the value r, and hence the error v is b, = r’ — rp. Hence the 
estimate of the error » of the estimate r’ for f(u) is: 


(3) e(vf,u,e) = >> [vp X csl- 
We find easily the following result. 
T102-1. e(v,f,u,e) = o. 


Proof. According to (2), the sum in (3) is 2[(r’ — rp) X cp], hence r’ X Zep — 
[rp X cp]. Of the two sums here occurring, the first is 1, the second 7’ (1). 
Hence the result is 7’ — 7’ = o. 


Tr says that the estimate of the error itself of any estimate is o. This 
means that the estimate of any function fis such that the possible positive 
errors and the possible negative errors, each weighted with its c, cancel 
each other out. This result is interesting because it states an important 
characteristic of the estimate-function e; but it shows that the estimate 
of v cannot be used for measuring the reliability of the estimate for f. 


B. The estimate for the absolute value | v | of the error v is . 
W (ohud = Dlr- r| Xe) 


Now we divide the possible values of f(u), viz., 7}... ,?n into two 
classes: the first contains those values r, for which r, < 7’, and hence 
vp 2 o and | v, | = v; the second contains those for which noT, 
hence bp < o, and | vp | = —vp. In the following theorem, ‘5,’ is meant 
to extend over the first class and ‘5,’ over the second. 
T102-2. e(| v |,f,u,e) = e(f,u,e)[Zicp — Dap] — Dalry X col + 
Dalry X cyl. 
Proof. The sum in (4) is 
ZAC’ — r)ep] + ZAC — rep] = 7' X Zep — Vil X el 


+ ily X cp] — 7’ X Zatp 
Hence the assertion. 


C. The estimate of the square error v? is: 


(5) e(v*,f,,e) = >> [(r’ — 15)? X ed. 


We shall adopt this method as our explication of the reliability of an 
estimate. It will be discussed in the next section. 


í 


ponies 


eee 
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§ 103. The Estimated Square Error of an Estimate 


The last of the methods explained in the preceding section is adopted for ex- 
plicating the reliability of an estimate. It consists in applying the general meth- 
od of c-mean estimation (§ 99) to the square of the error. 


The development of the statistical theory of errors since Gauss has 
shown that, at least in all those cases in which the distribution is known or 
assumed to be normal, the square error is a more fruitful concept than the 
error itself or its absolute value. Therefore the last of the methods dis- 
cussed in the preceding section seems especially suitable as an explicatum 
for reliability. We now adopt it as our explicatum; that is to say, if an 
estimate 7’ of f has been determined, then we take as measure of its relia- 
bility the estimate of its square error. (Both estimates are, of course, 
meant as c-mean estimates, in the sense of the function e.) The square 
root of this estimate of the square error serves, of course, just as well. 
The latter concept is the inductive analogue to, but more general than, 
the statistical concept usually called the standard deviation ø (square 
root of the variance); we shall call it the estimated standard error and 
designate it by ‘f’; hence the estimated square error is {?. 


+D103-1. 
a, The estimated square error of e(f,u,e), in symbols: {?(f,u,e), =pt 
e(v?,f,u,€) $ y 
b. The estimated standard error of e(f,u,e), in symbols: {(f,u,e), =pt 
Ve(v?,f,u,€)- : 
The following theorem shows how f? can be determined from e(f) and 
e(f?) without the use of v. $ 


+T103-1. 
a. P(f,u,e) = e(f?,u,e) — e(f,u,e). 

Proof. Let us first determine the estimate of the square of the difference be- 
tween an arbitrarily chosen fixed number g (instead of 7’) and 7, i.e., f(u). Later 
we shall substitute z’ for g. The estimate mentioned is 2[(q — rp)? X cp] = 
ZG — 29rp + 15) X cp] = GF X Zep — 2q X Elrp X cpl + Zr? X cpl. The first 
of the latter three sums is 1, the second is 7’; hence the whole is 


G) P — 2g + Era X cole 
This will be used later. Now we substitute 7’, that is, e(f), for g; the first two 
terms become z” — 27? = —r’?, The third term in (1) is the estimate for 7°, 


hence e(/?). Hence the assertion. 


b. {(f,u,e) = Vef u,e) — e?(f,u,e). (From (a).) 


It was remarked earlier that we should clearly distinguish between the 
estimate of f* and the square of the estimate of f. We see now that (except 
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for the trivial case that there is only one possible value of f) the first of 
these two values is always greater than the second and that their difference 
is the estimated square error. ` 

The following theorem says that the estimate e(f) has this characteristic 
property: the estimate of the square of the difference between any fixed 
number g and f(u) has its smallest value if we take e(f) as g. 


7103-2. e[(7 — f(u))*,e] varies with g in such a way that it is a mini- 
mum for q = e(f,u,e). 

Proof. The estimate mentioned has been determined above as (1). By dif- 
ferentiating (1) partially with respect to g we obtain 2g — 2r’. This is o only for 
q = r’. The second differential coefficient is 2, hence positive. Therefore (1) has 

a minimum for g = 7’. 
Examples. 1. The earlier example (beginning of § ror): rp = 1, 2, 3 with 
equal values tp = 1/3. We find, as previously, e(f) = 2, hence e(f) = 4; 


p= 
Sum 
I 2 3 
Tp I > 2 3 
&p 1/3 1/3 1/3 
Tato 1/3 2/3 3/3 2; this is e(f). 
p I 4 9 =e 
Tip 1/3 4/3 9/3 14/3; this is e(f?). 
Vp ak o I 
vp I o I 
Diep 1/3 o 1/3 2/3; this is e(v?). 


e(f2) = 14/3. Further, f? = e(v?) = 2/3; this is indeed = e(f?) — e*(f), in ac- 
cordance with Tra. The estimated standard error f is »/2/3 = 0.82. 

2. We take the same 7,-values, but with cp = 1/5, 3/5, 1/5; thus the out- 
side values are less probable than the middle value. (We omit in the table those 
lines which are as above.) We find, as in the first example, e(f) = 2, hence 


= 
Sum 
I 2 3 
€» 1/5 3/5 1/5 
Tp 1/5 6/5 3/5 2; this is e(f). 
Tp ~ 1/5 12/5 9/5 22/5; this is e(f4). 
tzep 1/5 o 1/5 2/5; this is e(%?). 


Elf) = 4. elf?) = 22/5. P = el) = 2/5; this is again = e(f’) — el). | = 0.63. 
{is here smaller than in the first case. Thus the estimate of f, although it has 
the same value 2, is here more reliable than in the first case. This is plausible 
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because the probability for the outside values and thus for a discrepancy be- 
tween the actual and the estimated value of f is here smaller than in the first 
case. 

3. Finally, we take the same r,-values with cp = 2/5, 1/5, 2/5; thus the 
outside values are heré more probable than the middle value. We find, as in 


p= 
Sum 
I 2 3 
Cp 2/5 1/5 2/5 
Tat 2/5 2/5 6/5 2; this is e(f). 
tity 2/5 4/5 18/5 2g tkis Pal. 
Dcp 2/5 o 2/5 4/5; this is e(v?). 


the other examples, e(f) = 2, hence e(f) = 4. e(f?) = 24/5. F = ew) = 4/5; 
{ = 0.89. This is here greater than in the first case. Thus the same estimate of f 
is here less reliable. This is plausible because the outside values are here more 
probable than in the other cases. 


The judgment that a given estimate e(f) is highly reliable is explicated 
in our method by the statement that its estimated square error F(A) is 
small, It is important to see clearly that this judgment of reliability itself 
is again an inductive judgment. It says something about the probable 
relation between the estimated value r’ and the actual value r of f. A 
judgment on the actual relation between these two values cannot be given 
by inductive logic; it presupposes a determination of the actual value r, 
which can be made only by empirical investigations going beyond e. 
Furthermore, the judgment on the reliability of an estimate e(f) of f is 
itself an estimation e(b) of the square error; and the e-function used in 
the latter estimation is the same as that used for e(f). Therefore, this 
determination of the reliability of estimates is, so to speak, an internal 
affair within one system of inductive logic based upon a chosen function ¢; 
it cannot be used as a method for obtaining an external, objective judg- 
ment on the goodness of a system of inductive logic. Suppose that X 
chooses the c-function c+, and the estimate-function e: based upon cz and fr 
based upon ¢;. Suppose hé finds as estimate for f(u) e(f) = 0.6 with 
{.(f) = 0.1. Similarly, X, chooses c+ and, based upon it, e+ and fa, and 
finds, for the same f(u), on the same evidence, ea(f) = 0.7 with f.(f) = 
o.or. X+ might perhaps be tempted to conclude from these results that 0.7 
is a much more reliable estimate of f(u) than 0.6, because his estimated 
standard error is much smaller than that found for e,(f); and if similar re- 
sults would be obtained for other magnitudes, he might claim that his 
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estimate-function e, were generally more reliable than e,. However, these 
conclusions would constitute a disastrous fallacy. If c+ happens to be an 
inadequate explicatum for probability, then not only are the c,-values 
and the e;-values unreliable but also the estimated standard error 0.01, 
since this is determined by the same function ca; hence in this case nothing 
can be inferred from the smallness of the value 0.01. What then is the use 
of determining the estimated standard error? It is not a method of com- 
paring two functions c, and ca. It is useful only if we have other reasons 
for regarding a chosen function ¢ as adequate. (Such reasons may, for in- 
stance, consist in the fact that in many actual or imagined knowledge 
situations the values of ¢ are sufficiently in agreement with the inductive 
thinking of a careful scientist.) If c is adequate, then the method of the 
estimated standard error may be used for comparing the reliability of 
estimates made on the basis of different evidences but with the same 
function c. 


This concludes the general discussion of estimation. In the following 
sections we shall deal with those special cases of estimation which arise 
with respect to our language systems &. 


§ 104. Estimation of Frequencies 


In the remainder of this chapter the general method of estimation previously 
developed, that is, the e-mean estimate-function ¢, is applied to our systems g, 
and in particular to absolute and relative frequencies, which can be expressed 
in &. First, the frequency of true sentences in a class of given sentences is 
studied. It is found that the estimate of the relative frequency of truth among 
given sentences of any kind is equal to the arithmetic mean of the degree of 
confirmation of these sentences (T2b). This important result justifies our earlier 
explanation of probability, as an explicandum in terms of an estimate of rela- 
tive truth-frequency (§ 41D). Secondly, the estimate of the frequency of a given 
Property among given individuals is discussed. In analogy to our earlier dis- 
tinction between direct and predictive inferences, we distinguish now between 
direct and predictive estimations. The estimation of the frequency in a sample 

«is called direct, if the given frequency on which the estimation is based is that 
in the population; it is called predictive, if the given frequency refers to an- 
other sample. 


In the foregoing sections of this chapter we have discussed the general 
features of a method of estimation for any languages, especially languages 
containing quantitative magnitudes like those of physics. This method 
consists in the use of the c-mean estimate-function e. In the remainder of 
this chapter we shall discuss the results to which this method leads if it 
is applied to our simple language systems &. 


EAD 
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The systems £ do not contain any quantitative magnitudes in the or- 
dinary sense like length, mass, temperature, and the like. Nevertheless 
certain numerical functions based, not on measurement, but on counting, 
can be expressed in the systems £. The two most important of these func- 
tions are the absolute and the relative frequency of a property within a 
given class. In the following our method of estimation will be applied to 
these two functions only. 

We shall first discuss the frequency of truth among given sentences, and 
later the frequency of any molecular property among given individuals. 
We shall find that the first application of frequency is the most general 
case expressible in our language systems and that the second can be ob- 
tained as a specialization of the first. 

Suppose a finite class §; of s sentences is given by enumeration: 
{in ta, ... , ia}. By the absolute truth-frequency or, briefly, truth-frequency 
in &;, in signs of the metalanguage: ‘tf(,)’, we mean the number of true 
sentences in §,. By the relative truth-frequency in &;, in signs: ‘rtf(®;)’, 
we mean the quotient tf(&,)/s. ` 

For example, let &, be the class {ż,, 7,, 7;}, where 7,, 7,, and 7, are given 
sentences of any form of a given system £. That the cardinal number s of 
R, is three is seen from the definition of &,; it is obviously not dependent 
on the facts referred to by the three sentences of Ñ, or any other facts. 
On the other hand, the sentence 
(1) $ tR) = 2’ 
is a factual, empirical sentence (of the metalanguage); it says that, 
among the three sentences of &,, there are exactly two which are true. 
Since we know from the definition of &, that s = 3, we can deduce from 
(1) the sentence 
(2) ‘rtf(®:) = 2/3 
and vice versa. Hence (1) and (2) are logically equivalent; they express 
the same factual content in different conceptual forms. Both sentences 
belong to the metalanguage, not to £. 2 contains neither functors like 
‘tf’ or ‘rtf’, nor numerical expressions like ‘2’ or ‘2/3’. Nevertheless, the 
common factual content of (1) and (2) is expressible in Q, in a still different 
form. The term ‘truth’ is here understood in the semantical sense (see 
§ 17). The statement ‘is true’ has the same factual content as the state- 
ment 7 in 2 and can therefore be translated into 7. Likewise, the state- 
ment ‘% is false’ is translatable into ~t. (1) says that two of the three 
i-sentences are true and one is false. There are three possibilities in which 
this is the case. Each of these possibilities is expressible by a conjunction 
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in £, and hence the whole by the disjunction of these conjunctions, viz. 
(ir iz a ~i) V (i a Win ei) V (Wi, 0 ie i,). Let this be k,. Thus this 
disjunction k, in £ may serve as a translation of (1), and simultaneously 
as a translation of (2), because the latter is logically equivalent to (x). 

In general terms, let Ñ; be a class of s given sentences, {7,, 72,..., %}. 
We shall now explain a few terms to be used only in the present discussion 
and in the proof of T2a. The following sentences are called the -sentences 
for the given class (as in T21-7; their number is 2°): first the conjunction 
7,+4,+....«%, and furthermore all conjunctions formed from this one 
by negating some or all of the components. A k-sentence obtained by 
negating exactly m (m = o, . . . 5) of the s components in the original 
conjunction is called a k™-sentence. (The number of the #"-sentences 
is (4).) For every m, let žm be the disjunction of all k-sentences (in the 
lexicographical order). There is only one &!-sentence, viz., the conjunction 
of the 7-sentences; thus /, is this conjunction itself. There is likewise only 
one k°-sentence, viz., the conjunction of the negations of the z-sentences; 
thus %o is this conjunction. For all other values of m, the number of k™-sen- 
tences is = 2, and /p is their disjunction. For any particular value m, the 
sentence (in the metalanguage) ‘tf(;) = m is true if and only if one of 
the &"-sentences is true; hence it is translatable into their disjunction, 
that is, Am». This leads to the following theorem. 


T104-1. Let £ be any finite or infinite system, c a regular c-function 
in £, e based upon ¢ (Dtoo-1), e any non-L-false sentence in £, and ĝ; a 
class of s given sentences in £. For any m (m = o, 1, . . . , S), let Am be as 
explained above. Then the estimate of truth-frequency for &; on e is de- 
termined as follows: 


e(tf, Re) = DS [m X c(Ume)] « 


’ 


(From T1oo-1. The value m = o, although possible, need not be included 
in the sum because the term of the sum for this value is o.) 


We shall now proceed to prove a theorem (T2) which is of great impor- 
tance for the foundation of inductive logic, since it justifies the interpreta- 
tion of probability, as an estimate of relative truth-frequency. We have 
used this interpretation in our earlier discussion of the meaning of proba- 
bility, as an explicandum (§ 41D). The theorem is of great generality. It 
holds for all regular c-functions. It holds for any given sentences, irre- 
spective of inductive or even deductive dependencies among them. One 
such sentence may, for instance, L-imply another one or even be L-equiva- 
lent to it. In the latter case, the same proposition would be expressed by 
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two or more of the given sentences, and hence their truth-values would 
necessarily be the same. Nevertheless the theorem of the estimate of 
truth-frequency holds in this case just as if the sentences were logically 
independent of each other. 


+T104-2. Let £, c, e, e, ®;, and s be asin Tr. Let the sentences of R; be 
i,, t2,..., is. Then the following holds. 


a. eltf, Re) = 5 Cllae): 


Proof. Concerning k-sentences, k”-sentences, and the hypotheses Jm (m = o 
to s), sce the explanations preceding Tx. For any n (from 1 to s), the nth con- 
junctive component in some of the k-sentences (indeed in half of them) is iny 
let &n be the class of these k-sentences. In the other k-sentences it is ~in. 
The sentence 7, is L-equivalent to the disjunction of the sentences of Ra 
(T21-7d). Therefore, since the k-sentences are L-exclusive in pairs (T21-7a), 
cline) = dB, c(k,e) where k runs through the &-sentencesin Ra (I'59-1m). Hence, 

T 


W È tind) = È E cho), 


where for every n, the second sum covers the k-sentences in §,. The c-value of 
any k”™-sentence appears m times in the double sum in (4). (Consider, as an 
example, the -sentence k’ which contains ta, i,, is, and the negations of the 
other i-sentences as conjunctive components. k’ is a k3-sentence. Its c-value 
occurs in the double sum in (4) three times: first for n = 2, because k’ con- 
tains i2 and hence belongs to ®2; thèn for n = 4 because of i4; and finally for 
n = 5 because of is.) Therefore the double sum is equal to x [m X c(k,e)], 


where the sum covers all k-sentences, and the c-value of any. k-sentence 

(m = o to s) is multiplied by m. This sum is now transformed into another 

double sum by grouping together the &-sentences with equal m: DS [m X DT 
T 


c(k,e)], where for any m, the second sum covers all k”-sentences. Now the value 
of this second sum is equal to c(/m,e) (Ts9-tm), because Am is the disjunction 
of the k”-sentences and these sentences are L-exclusive in pairs. ‘Thus we 
obtain from (4): 


(s) E cline) = DS be X Um) 


The right-hand side here is the estimate of tf (T1); hence the theorem. 


b. elrtf, Rae) = | >> clime); that is, the estimate of the relative truth- 


frequency in §; is the arithmetic mean of the c-values of the sentences 
in &;. (From (a), T100-3.) 

c. (Corollary.) If all i-sentences have the same c-value on e, then 
e(rtf, @:,¢) is equal to this value: (From (b).) 

Tıc can be used in the clarification of probability, as an explicandum. 
Suppose a person X has an idea of what he means by the estimate (or ex- 
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pectation-value, in the classical sense of this term) of a magnitude; this 
idea is an explicandum, that is to say, it is clear enough to him for practi- 
cal purposes on a pre-systematic level, although he may not yet have a 
systematic explicatum for it. Suppose further that the idea of probability, 
as a quantitative concept which takes a numerical value for any given 
~ hypothesis with respect to any given evidence is less clear to X. (A psy- 
chological situation of this kind is by no means fictitious. Many authors, 
among them perhaps a majority of contemporary statisticians, reject the 
logical concept of probability, or admit it only in a nonquantitative form. 
On the other hand, practically all statisticians either apply methods of 
estimation or are at least in search of satisfactory methods, which shows 
that they admit the concept of estimation at least as an explicandum.) 
The result T2c shows the following way of explaining to X the meaning 
of probability, in terms of estimation. If you ascribe the same value, say, 
0.3, to several, say, one hundred, hypotheses on the basis of the same evi- 
dence e, then thereby you show that you estimate, on the evidence e, the 
number of true ones among the one hundred hypotheses as thirty. Or, the 
other way round: suppose that on the basis of your total observational 
knowledge expressed by e, you have equal confidence in each of one hun- 
dred hypotheses but do not know how to measure this confidence quan- 
titatively, but you are able to make estimations and you estimate the 
number of true ones among the hundred hypotheses as thirty, then take 
0.3 as a measure of rational confidence, that is, ascribe the value 0.3 to 
each of the hypotheses as its probability, on e. Tb can be used analogous- 
ly, but in a more general way. Here the values of probability, need not be 
equal; if they differ, still their arithmetic mean expresses the estimate of 
the relative truth-frequency. In our earlier discussions on the meaning of 
probability, as an explicandum, we have equated the value of probability, 
with the estimate of rtf (§ 41D). This procedure is now justified by T2c. 
(This theorem itself refers to the concepts of estimate and of degree of con- 
firmation as explicata. But it holds generally for all regular c-functions. 
Therefore it shows that we shall not involve ourselves in contradictions 
when we explain the concept of an estimate as an explicandum in terms 
of probability, and then again explain probability, in terms of an estimate 
of rtf.) 

Now we proceed to another application of the frequency concepts, viz., 
the frequency of a given property among the individuals of a given class. 
These explanations will be perfectly analogous to the former ones on 
truth-frequency. Here again, we formulate these concepts first in the 
metalanguage. Let M be a property expressible by a molecular predicate 
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in £, say, ‘M’. Let K be a finite class of s individuals defined by enumera- 
tion: K =pt {ax, d2,..., Qs}. By the absolute frequency of M in K, in 
signs of the metalanguage: ‘af(M,K)’, we mean the number of those in- 
dividuals in K which have the property M. The relative frequency of M in 
K is rf(M,K) =p; af(M,K)/s. 

To take a concrete example, let us define: K = p;{a,b,c}. It follows from 
this definition that the cardinal number of K is three. On the other hand, 


(6) ‘af(M,K) = 2’ 
is a factual, empirical sentence. From (6) we can deduce 
(7) ‘ti(M,K) = 2/3’. 


Hence (6) and (7) are logically equivalent. The following sentence (8) is a 
translation of (6) into {; it serves simultaneously as a translation of (7): 


(8) “Ma.Mb.~Mc) V(Ma.~Mb.Mc) V(~Ma.Mb. Mc)’. 


This sentence is a special case of our former k, for s = 3, with full sen- 
tences of ‘M’ as i-sentences. It is a statistical distribution (D26-6c) for 
the division ‘W’, ‘~M’ with respect to the three individuals of K. The 
three disjunctive components in (8) are isomorphic individual distribu- 
tions (D26-6a, D26-3). 

The subsequent theorem T3 is analogous to Tr. It determines the esti- 
mate of af in terms of the c-values for hypotheses which state the possible 
values of af. These values are the cardinal numbers o, 1, . . . , s$ (or those 
of them which are not excluded by e). Therefore the hypotheses stating 
these values are statistical distributions, like the example sentence (8) (for 
Ss = 3, af = 2). : 

7104-3. Let £, c, e, and e be as in Tr. Let K be.a class of s individuals 
in £ (s > 0), ‘M’ a molecular predicate in |Ì, hm (m = o, 1,...,5) the 
statistical distribution for ‘M’ and ‘~M’ with respect to the s individuals 
in K with the cardinal number m for ‘M’. Then 


e(af,M,K,e) = >> [m X clhm,e)] . (From Tx.) 


The following theorem states the simple relation between the estimates 
of rf and af. 

T104-4. Let £, c, e, and e be as in Tr. Let K, s, and ‘MW’ be as in T3. 
Then the following holds. 

a. e(rf,M,K,e) = e(af,M,K,e)/s. (From Troo-3.) 

b. e(rf?,M,K,e) = e(af?,M,K,e)/s?. (Likewise.) 

c. 2(rf,M,K,e) = [?(af,M,K,e)/s?. (From T103-1a, (a), (b).) 
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d. {(tf,M,K,e) = {(af,M,K,e)/s. (From (c).) 
e. f(tf)/e(rf) = f(af)/e(af). (From (d), (a).) 


T4a compares the estimates for rf and af themselves. T4c compares 
f?, i.e., e(v?) (D103-1a), the estimated square errors of those estimates of 
rf and af. T4d compares the estimated standard errors f. T4e says that 
the estimated relative errors, that is, the quotients {/e, are the same for 
rf and af. (The relative error in this sense is the inductive analogue to the — 
statistical concept of coefficient of variation.) These relations will be used 
in the following. 

Let K, be the class of individuals described in e and K. 2 that described 

in k. The two chief cases to be distinguished here are the following. (1) Ka 
is contained in K,; (2) K, is outside of K,. In the first case, we have earlier 
called K, the population and K, a sample from the population; and the 
determination of c(%,e) was called the direct (or internal) inference. In the 
second case, K, is one sample and K; is another, nonoverlapping sample; 
the determination of c in this case was called the predictive (or external) 
inference. Now an estimate of af or rf of M in K, with respect to e is based 
on c(h,e). Hence this estimate is made in the first case with the help of 
direct inferences, in the second case with the help of predictive inferences. 
Therefore we shall speak in the first case of a direct (or internal) estimate, 
and in the second case of a predictive (or external) estimate. 

The direct estimation of frequencies will be discussed in the next sec- 
tion, and the predictive estimation later, 


§ 105. Direct Estimation of Frequencies 


The direct estimation of frequencies is here discussed, that is to say, the fre- 
quency of a property M in a population is given, and on this basis an estimate 
of the frequency in a sample is to be made. Theorems are given which determine 
the values of these estimates and their estimated square errors (T1). The most 
important result is this (Trh): the estimate of the relative frequency in the 
sample is equal to the given relative frequency in the population. The theorems 
on direct estimation hold for all symmetrical ¢-functions, like the theorems on 
the direct inference from which they are derived. 


In this section we shall deal with direct estimation, that is, with the 
estimates of af and rf in a sample based on the given frequency in the 
population. We found earlier that it was possible in the case of the direct 
inference, in distinction to all other kinds of inductive inference, to prove 
theorems determining the c-values without specifying a particular c-func- 
tion, because these values are the same for all symmetrical c-functions 
($§ 94-96). We can now use the earlier results concerning c for deriving 
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results concerning direct estimation, which likewise hold for all sym- 
metrical c-functions. 

As in the earlier theorem stating the exact c-values for the direct infer- 
ence (T94-1), we consider a population of n individuals. We take here 
$ = 2, that is, a division consisting of only two properties, M and non-M. 
The evidence e is a statistical distribution stating that the af of M in the 
population is #, and that of non-M n, = n — n,; hence the rf of M is 
n,/n = r, and that of non-M is n./n = ra. K is a sample of s individuals 
taken from the population. The earlier theorem (To4-1b(2)) gives the 
c-value for any s,, that is, for any possible value of af of M in the sample. 
This enables us now to determine the estimate of af in the sample, and 
then that of rf. 

+T105-1. Direct estimates of frequencies. Let Q, ‘M’ (for ‘M,’), e, 
N, Ny, Na, fr, Y2 be as in Tg4-1, but with p = 2. Let K be a class of s indi- 
viduals belonging to the n individuals referred to in e. Let km (m = o, 

, $) be as in T104-3. Let c be a symmetrical c-function and e be based 
on c. Then the following holds. 

a. e(af,M,K,e) = sr; (From T104-3, T94-1d). 

b. That value of m, that is, af, for which c(A»,€) has its maximum either 
coincides with e(af), if the latter is an integer, or is an integer close 
to e(af). (From (a), T94-r1c.) 

c. (1) e(af4,M,K,e) = zaco s(n: — 1) +a; 

(2) AE 4]. 

Proof. e(af*) = È [m? X c(hm,e)] in analogy to Tx04-3, = 2 [m3 (™) (en) 
AO] (T94-1b(2)). m? Œ)= mn: (u21) (T40-8e). Hence the sum is “(with l=m—1) 
My > [0+ 1)(T)(1)]. The latter sum is 2 eTA) + D 
He TA ™_1)]. The first of these two sums becomes (with T40-8e,and k = l — 1) 

(nz — 1) 2 LCE) (e:)], hence (m: — 1)(323) (T40-9c). The second of the 
above ay ‘sums is (t=) (T40-9c). Hence, with some simple transformations 
(using D4o-2a), the assertion. 

d. Approximation for the case that n, (and hence n) is large in relation 
to x (it need not be large in relation to s): 
e(af?,M,K,e) = sri(sr; + ra). (From (c)(2).) 

e. P(af,M,K,e) = s xE (1 — į). (From T103-1a, (c)(2), (a).) 

f. Approximation for the case that n is large in relation to 1: 
P(af,M,K,e) œ srir.(1 — 5). (From (e).) i 

g. Approximation for the case that is, moreover, large in relation to s: 
f(af,M,K,e) = srira; hence f = »/sr.rz. (From (f).) 
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h. e(rf,M,K,e) = r,. (From Tro4-ga, (a).) 
i, f(rf,M,K,e) = meco (1 — $). (From T104-4¢, (e).) 
j. Approximation for the case that n is large in relation to 1: 
(iM, Kye) ~ F(x — 5). (From Tr04-4¢, (f).) 
k. Approximation for the case that n is, moreover, large in relation to s: 
P(rf,,K,e) ~ r.r,/s; hence {(rf) = s/rir2/s. (From T104-4¢, (g).) 
1. Approximation for the case that » is large in relation to s: 


{(rf)/e(rf) = f(af) /e(af) = +/r./sr;. (From (h), (k); T104-4e.) 


The most important results are T1a: e(af) = sr;, and Trh: e(rf) = r;. 
Thus, if the rf of M in the population is known to be 7,, then the estimate 
of rf in any sample is likewise r,. Note that this result holds exactly for 
any n and s, even if the population is not large and the sample constitutes 
a considerable part of the population. (The proof of Tra is based, not on 
the binomial law (T95-1), but on the unrestricted theorem for the direct 
inference (T94-1).) 

Tıb says that the estimate of af and the most probable value of af are 
either equal or close together, 

The values found for e(af*) (Tıc and d) are chiefly used for determin- 
ing f? (Tre, f, g), that is, the estimated square error by which we measure 
the reliability of the estimate made for the af in the sample. The value 
Trg can be derived also from the binomial law. For the case where the 
sample is not a small part of the population, T1f gives a convenient ap- 
proximation more exact than Tıg. The estimated square error for the 
estimate of the rf in the sample, which is r, (Trh), is stated in Tri, j, and k. 
The approximation Tık can be derived also from the binomial law. When 
the sample size s increases, {?(af) increases and is approximately propor- 
tional to s; but f*(rf) decreases and is approximately proportional.to 1/s. 
Trl states the relative error, i.e., {/e, which is the same for rf and af 
(Tro4-4e). 

Tz is formulated for a finite population size ». Now it was mentioned 
earlier (§ 95) that the values of c for the direct inference stated in the bi- 
nomial law (T95-1) hold exactly if the population is infinite and z, is the 
limit of rf with respect to a fixed serial order of the individuals. Under 
these conditions the values stated in Tra, d, g, h, k, and 1 hold likewise 
exactly, because they can be derived from the binomial law. 

‘The customary methods in statistics regard the rf in the population, 
for which they use the term ‘probability’, as given or hypothetically as- 
sumed; therefore they correspond to what we call the direct inference and 
the direct estimation. Hence our direct estimate of af corresponds to what 
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is called in statistics the expected value of af; {?(af) corresponds to the 
mean square deviation (or the second moment about the mean); f(af) to 
the standard deviation ø. Thus, leaving aside the difference in interpreta- 
tion—which is here again the difference between inductive and statistical 
concepts—we can compare the values stated in Tx with the customary 
values given in statistics. We find that the values given in Tra, d, and g 
agree with the traditional values in statistics. However, the latter values 
are usually derived from the binomial law. If the population is infinite, 
this law and hence the values mentioned hold exactly. On the other hand, 
in the case of a finite population the binomial law holds only approxi- 
mately. In order to find the exact values in this case, derivations inde- 
pendent of the restrictions of the binomial law must be used, as was done 
in the proofs of Tra, c, and e above. It is interesting to see how in this case 
the exact values (Tıc, e) differ from the customary values (Tıd, g); and 
it is likewise interesting to see that the value for e(af) (Tza) holds un- 
changed. In comparison with our general method for the estimation of af, 
the customary method may be characterized as dealing with the special 
case in which (r) the estimation is direct, that is, the frequency in the 
population is given, and (2) the population is infinite. 


Examples, for direct estimates of frequencies (T1). We take the numerical 
values of the two examples for direct inference given in § 95. 

First Example. Small population: n = 14. n: = 10; hence n: = 4, r: = 5/7. 
ra = 2/7. We consider a sample K with s = 7. We find (Tra): e(af) = sr: = 5, 
Since this is an integer, it must be the most probable value of af, i.e., that value 
of s, for which c(fst,e) has its maximum (Tıb); the table in § 95 shows that 
this is indeed the case. Tre: f*(af) = 10/13 = 0.769; hence f(af) = 0.877. 
{(af)/e(af) = 0.175. [The conditions for the approximations T1f and g are here 
not fulfilled. In particular, Trg cannot be applied because the sample is one 
half of the population; Trg would give for f* the value 10/7 = 1.43, which is 
nearly double the correct value. T1f would give 10/14 = 0.714 instead of 
10/13 = 0.769; here the deviation is much smaller than in Txg.] Trh: e(rf) = 
rx = 5/7 = 0.714. Since f?(af) has been determined, it is simpler to determine 
P(rf) by Tro4-4c than by Tri: (rf) = P(af)/s? = 0.0157. Hence (or with 
Tro4-4d): {(rf) = 0.125. {(rf)/e(rf) = 0.175; the relative error is here the same 
as for af, in accordance with T104-4e. 

Second Example. Large population, size not specified; either infinite or a large 
finite n. rı = §/7; hence r2 = 2/7. Sample K with s =-7. Tra: e(af) = sr: = 5, 
as in the first case; here it is again the most probable value of af. In the present 
case Trg can be applied: (af) = 10/7 = 1.429. Hence {(af) = 1.195. This is 
considerably greater than in the first case. An estimate from a smaller popula- 
tion is generally more reliable than from a larger one; this can be seen from 
Tre. f(af)/e(af) = 0.239. Trh: e(rf) = rz = 5/7 = 0.714, as in the first case. 
Here we may apply Trk; but it is again simpler to use Tro4-4c: f*(rf) = 
F(af)/s? = 0.0292. Analogously, with Tro4-4d: (tf) = {(af)/s = 0.171. 
{(rf)/e(rf) = 0.239, as for af. 
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§ 106. Predictive Estimation of Frequencies 


The last two sections of this chapter deal with predictive estimation of fre- 
quencies. The evidence describes an observed sample, and the estimate is made 
for the frequency of a property M in a second sample K not overlapping with 
the first. The values of predictive estimates cannot be stated generally, be- 
cause they vary with the c-function chosen; but general theorems concerning 
relations between such estimates can be stated. A. It is found that the estimate 
of the relative frequency of M in K is equal to the degree of confirmation for 
any singular prediction ‘Mb’ (Tıc). Therefore this estimate is the same for any 
finite class K, independently of the number and choice of the individuals in K 
(Trd). The same estimate holds also for an infinite class. B. It is shown that the 
concept of the limit of the estimate of rf in an infinite sequence does not involve 
any of the problems and difficulties which are connected with the limit of rf 
itself. C. The problem of the reliability of a value of degree of confirmation is 
discussed. A tentative solution is indicated in terms of the estimated standard 
error of an estimate of the relative truth-frequency. 


A. Theorems on Predictive Estimation of Frequencies 


We shall now study the predictive estimation of frequencies in a class 
of individuals. It was mentioned earlier (§ 94) that the predictive infer- 
ence, in distinction to the direct inference, depends upon the choice of a 
c-function. That is to say, theorems determining the c-values for the pre- 
dictive inference cannot be stated, like those for the direct inference, in a 
general form for all symmetrical c-functions, but only for a particular 
c-function; this will be done later for our function c* (in Vol. II; cf. 
§ 110C). For the same reason it is not possible to state theorems giving 
e-values for the predictive estimation in the general form as we did for the 
direct estimation in the preceding section; these theorems will likewise be 
stated later for e*, based upon c*. Nevertheless, it will be possible here to 
state general theorems on the predictive estimation which do not give 
the e-values themselves but relations between them. 

In the case of the predictive estimation, the evidence e refers to one 
sample and the estimate is to be made for a second sample K not over- 
lapping with the first. If the estimate concerns the frequency of a property 
M in K, then the case that e is an individual or statistical distribution 
giving the frequency of M in the first sample is of special interest. For the 
following discussion we shall, however, not restrict the form of e. With 
respect to a given e, we call any individual constant not occurring in e new, 
and likewise any individual named by a new constant. In application to a 
knowledge situation, e may report the observational results with respect 
to the individuals of a given sample. K is a class of individuals which have 
not yet been observed but which we perhaps expect to observe in the 
future. K may, for instance, be a second sample chosen from the popula- 
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tion, or it may be the remainder of the whole population. An estimate for 
this latter case is often of especial interest. 

We shall now apply our earlier result on truth-frequency (T104-2) to 
the predictive estimation of the frequency of M in an unobserved sample 
K. If we take as the former class Q; of sentences the class of full sentences 
of ‘M’ for the individuals in K, then the truth-frequency in &; is obviously 
the same as the absolute frequency of M in K. Thus we obtain the follow- 
ing results. 


T106-1. Let £ be any finite or infinite system, c a regular c-function, e 
based upon c, K a class of s new individuals in l (s > o), ‘M’ a molecular 
predicate in &, z,, iz, . . - , %, the full sentences of ‘M?’ for the individuals 
in K, e a non-L-false sentence in g. Then the following holds. 


a. e(af,M,K,e) = >> cline). 


Proof. This follows from T104-2a. If we take as &; the class of the s t-sen- 
tences, then tf (&;) is the same as af(M,K). 


b. e(rf,M,K,e) = DR cine). (From (a), T100-3.) 


+c. If cis a symmetrical c-function and 7 is a full sentence of ‘M’ with 

any new in in £, then ; 
e(rf,M,K,e) = c(i,e) . 

Proof. Since the individuals in K are new, i.e., their in do not occur in e, for 

every in (n = 1, . . . , $), c(ine) = c(i,e) (To1-2c). Hence the theorem, with (b). 

d. Let c be a symmetrical c-function. Let K’ be any class of s’ new in- 

dividuals in & (s’ > o). (It is irrelevant whether K’ does or does not 

overlap with K.) Then 


e(rf,M,K’) = e(rf,M,K). (From (c).) 


For Tıc and d we need the assumption that c has the same value for all 
new individuals. Therefore these theorems are restricted to symmetrical 
c-functions. 

The theorem Trc is of great importance. It concerns the predictive 
estimate of the relative frequency of a property M in any finite or infinite 
class based on any symmetrical c-function. The theorem says that this 
estimate zs equal to the confirmation of a singular prediction for M. There- 
fore the c-value for a singular prediction may be interpreted as an esti- 
mate of the relative frequency of M and thus as determining a fair bet- 
ting quotient. This interpretation has been used earlier for explaining 
probability, as an explicandum (§ 41D). 
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It was explained earlier that the concept of rf of M in an infinite (de- 
numerable) class Ko must be based on a fixed serial order O of the ele- 
ments of Ka; rf(M,Ko,0) can then be defined as the limit of rf(M,K,), 
where K, is the class of the first n individuals with respect to the order O. 
Therefore the estimate of rf of M in Ka is the limit of the estimate of 
rf(M,Kn). Now, according to Trd, the latter estimate has a constant value, 
which is independent of the number # and of the choice of individuals in 
Kn. Therefore the estimate of rf(M,Ka,O) is this same constant value and 
hence is independent of O. 

It should be noticed that the question of the existence of the limit for 
e(rf) is quite different from that for rf. In the latter case, this question in- 
volves serious difficulties. For an infinite (denumerable) class Kw the con- 
cept of the relative frequency of M has no direct meaning (unless either 
af(M,Ke) or af(~M,Ka) is finite, in which case rf(M,Ko) = o or 1, 
respectively). A meaning to rf is given as a limit with respect to an order O, 
as just mentioned. It is essential that the order O be specified because the 
result depends upon its choice. For the same class Ke, the choice of one 
order may lead to a different value of the limit and hence of rf as defined 
than the choice of another order, and for a third order there may be no 
limit. The problems here involved are of great importance and have been 
much discussed because rf(M,Ko,O) as just defined is taken as explica- 
tum for probability, in the frequency theories of probability proposed by 
Mises and by Reichenbach (§ 9). The question of the choice of an order 
may be answered in many cases by taking the temporal order of the 
events or observations in question. The question of the existence of the 
limit for a specified sequence of events cannot be decided empirically by 
any finite number of observations. Some philosophers have thought that 
therefore these limit statements and hence all probability, statements, if 
explicated as limit statements, are meaningless. However, I think, like 
Mises and Reichenbach, that this objection is based upon a too narrow 
conception of the requirement of verifiability. It is possible to state the 
existence and the value of the limit for a given sequence of events as a 
hypothetical assumption. Then inductive relations between this limit 
statement and observational reports can be established, for instance, with 
the help of the binomial law or Bernoulli’s theorem. [A few incidental re- 
marks may here be made concerning the role of the concept of rf and its 
limit in mathematical statistics. There seems to be practical agreement 
that the problems raised by the use of the limit in this context are very 
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serious and deserve careful and thoroughgoing examination; and they 
have indeed been amply discussed from various points of view by Mises, 
Reichenbach, their followers, and their critics. It seems all the more sur- 
prising that some authors in modern mathematical statistics declare 
simply that they intend to use the term ‘probability’ for the relative fre- 
quency in an infinite population Ke of actual or possible events. No refer- 
ence to a limit is made, the question of the choice of an order is neither 
answered nor even raised; the term ‘relative frequency in an infinite class’ 
is innocently used as if it had as clear and unique a meaning as for a finite 
class, Other statisticians use formulations which are more cautious and 
unobjectionable. For instance, S. S. Wilks ((Statistics], p. 3) says that 
the empirically found cumulative distribution function F, (which shows 
the absolute and thereby the relative frequencies) with increasing n ‘“‘ap- 
pears to approach a limit Fo”; this appearance within long finite se- 
quences is then taken as suggesting the construction of a mathematical 
model for the infinite sequence. Similarly, Cramér ([Statistics], pp. 148 f.) 
says that it is found by an “empirical study of the behavior of frequency 
ratios” that the rf of a certain kind of event in a sequence of x repetitions 
of a random experiment “shows a tendency to become constant as n in- 
creases”. This leads to the “conjecture that for large n the frequency ratio 
would with practical certainty be approximately equal to some assignable 
number P.” Accordingly, a number P is introduced into the axiomatic 
theory of probability (as a primitive idea) and is called the probability of 
the kind of event in question. The axioms ascribe to these probability 
numbers “the fundamental properties of frequency ratios . . . in an ideal- 
ized form”, just as the axioms of geometry ascribe to lines those properties 
in an idealized form which we find empirically with lines made by chalk. 
Still other statisticians define probability explicitly as a limit.] 

Now it is important to realize that the use of the concept of limit in 
defining the predictive estimate of rf in Ke does not involve any problems 
or difficulties analogous to those we have just mentioned in connection 
with the definition of rf itself in Ka. e(rf) in Ko is independent of the order 
of the elements of Ka. If we change the order, then other individuals take 
the first s positions and hence the rf for the first s individuals may change 
its value. On the other hand, e(rf) for s new individuals does not change 
if we take s other individuals; for, although those other individuals may 
have different empirical properties, they have the same logical status, and 
this is all that matters for e(rf), if it is based on a symmetrical c. Further, 
there is no problem of the existence of the limit for e(rf) as there is for rf. 
While rf changes with s and the course of its values is determined empiri- 
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cally, not logically, the course of values of e(rf) with increasing s is logi- 
cally given. As we have seen (T1d) it follows from the definition of e(rf) 
that its value remains constant. Therefore there is always a limit; and 
this limit is equal to the value for any finite s. 


C. The Problem of the Reliability of a Value of Degree of Confirmation 


The possibility of regarding a c-value as an estimate of relative fre- 
quency points also a way to at least a tentative solution of the problem 
whether and how the reliability of a value of probability, or degree of con- 
Jfirmation could be measured. This problem has been discussed by only a 
few authors. Keynes gives a detailed discussion ([Probab.], chap. vi) but 
does not find a satisfactory solution. He remarks that with increasing 
relevant evidence the probability itself may either decrease or increase; 
“but something seems to have increased in either case,—we have a more 
substantial basis upon which to rest our conclusion” (p. 71). This he 
proposes to call ‘the weight of an argument’. He refers to only two previ- 
ous authors who have touched the problem, namely, Meinong ([Kries]) 
and Nitsche ([Dimensionen], esp. pp. 70-74). However, C. S. Peirce had 
indicated the same concept at a still earlier time: “Now, as the whole - 
utility of probability is to insure us in the long run, and as that insurance 
depends, not merely on the value of the chance, but also on the accuracy 
of the evaluation, it follows that we ought not to have the same feeling of 
belief in reference to all events of which the chance is even. In short, to 
express the proper state of our belief, not one number but two are requisite, 
the first depending on the inferred probability, the second on the amount 
of knowledge on which that probability is based”; here Peirce adds a 
footnote: “Strictly we should need an infinite series of numbers each de- 
pending on the probable error of the last” ([Probab.] 1878, see [Papers], 
II, 421). Recently, C. I. Lewis discussed the problem without giving a 
solution ([Analysis], pp. 292-303). 

Let us now approach the problem on the basis of our results. Suppose a 
symmetrical c has been chosen, and e is based upon it. For any given e and 
M, we can then determine the predictive estimate e(rf,M,K,e), which 
holds for any nonnull class K of new individuals; suppose we find the 
value z’. We can further determine, for this estimate e(rf) = 7’, the esti- 
mated standard error {(rf,M,K,e) (D103-1b). This, however, is in general 
not, like e(rf), independent of the cardinal number s of K. The value of f 
for the infinite class Ko (for which e(rf,M,Ke,e) has the same value 7’) is 
the limit (for s—> ©) of the f-values for finite classes; let this value be 
{(rf,1,Ke,e) = g. (This value is independent of any order of the ele- 
ments of Ko, since {(rf) for a finite class K is, although dependent upon $, 
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not dependent upon the choice of individuals in K.) q is the estimated 
standard error; it measures the reliability of the estimate e(rf) = 7’. Ac- 
cording to Tıc, e(%,e), where % is a singular prediction for M with a new 
individual, has always the same value as e(rf); hence here c(%,e) = r’. 
This suggests the idea of transferring the estimated standard error q, 
which was determined for the estimate e(rf) = 7’, to the result c(/,e) = r’. 
It is true that here the value g cannot be regarded as an estimated stand- 
ard error in the literal sense of the word ‘error’. For cis a logical function, 
and hence the determination of its value r’ cannot lead to an error (except 
in the sense of a miscalculation). But g may perhaps still be regarded in 
some sense as a measure for the reliability of the value r’ for c(h,e). This 
seems rather plausible in view of the fact that c(4,e) claims to state a fair 
betting quotient for 4 on e (see § 41B), which is essentially the same as 
stating a predictive estimate of the rf of M on e. Thus, if the value 
c(i,e) = r’ has been determined, the question may be raised whether 
betting according to this c-value will in the long run probably be suc- 
cessful; or, in other words, whether the estimate of rf will probably be 
accurate. It is this question that is answered by the determination of the 
value q as the ‘estimated standard error’ for c(#,e). The direct analogy 
with e(rf) helps in determining and interpreting the value q for c(h,e) only 
in the case where = is a singular prediction. In order to find a more gen- 
eral concept of the estimated standard error of c, the following procedure 
might be considered, which is applicable to any form of h. Let any hin 2 
be given; let the number of different new in in % be n. Let Q, be the class 
of all sentences obtained from + by replacing these new in by likewise 
new in in £. [In technical terms, &, is the class of correlates of k with 
respect to all those in-correlations which correlate the in occurring in e 
with themselves. If the number of different in in e is m, and the number 
of different new in in / is n, then in €y the number of sentences in ĝa is 
(N — m)!/(N — m — n)! (T40-32h).] Let us assume that c is symmetri- 
cal. Then all sentences in &, have the same c-value on e (Tor-2b, e’ is e). 
Therefore c(k,e) = e(rtf,®x,e) (T104-2c). Since here the c for % is again 
equal to an estimate, we may again measure the reliability of c(/,e) by 
(the smallness of) the estimated standard error of this estimate, that is, 
{(rtf, Rae). This tentative explication seems well in accord with the ex- 
plicandum indicated by Peirce. 


§ 107. Further Theorems on Predictive Estimation of Relative Frequency 


A. A theorem is proved concerning the estimate of the rf of Q-properties 
(ie., the strongest factual properties expressible in the system) on the null evi- 
dence. B. The problem of the admissibility of the null evidence as a basis of 
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confirmation or estimation is discussed. C. Further theorems on the predic- 
tive estimation of relative frequency are derived from earlier theorems on de- 
gree of confirmation. One of these theorems (T31) says that if ‘Mb’ is a factual 
sentence, the estimate of the relative frequency of M is neither o nor 1. This 
leads to a criticism of the so-called straight rule of estimation. D. The inverse 
estimation is the estimation of the frequency in a population based on an ob- 
served sample. The inverse estimate can be derived from the predictive esti- 
mate. 


A. Frequencies of Q-Properties 


The following theorem Tr refers to those systems € which contain 
primitive predicates for properties only, not for relations. Systems of this 
kind were called systems £*, where x is the number of the primitive 
predicates (see §§ 31, 32). The Q-properties are the strongest factual 
properties expressible in the system; they are designated by the Q-predi- 
cates ‘Q,’, etc. (A31-1, D31-1b); their number is x = 2" (D31-2, T31-1). 
A molecular predicate is said to have the (logical) width w if it is L-equiva- 
lent to a disjunction of w Q-predicates; w/x is called its relative width 
(D32-1). 

-+-T107-1. Let m be an m-function in a system &" such that (1) m is 
symmetrical, and (2) any two Q-sentences with the same in have the 
same m-value. Let ¢ be based upon m, and e be based upon c. Let K be 
any nonnull class of individuals. 

a. For any Q-predicate ‘Q’, e(rf,0,K,t) = 1/k. 

Proof. e(. . .) = e(Qa,t) (T106-1c), = m(Qa) (T's7-3). Consider the sen- 
tences ‘Q,a’,. . . , ‘Qa’, one of which is ‘Qa’. According to condition (2), these 
sentences have equal m-values. They are L-exclusive in pairs (T31-2a). Let j be 
their disjunction. Then m(j) is the sum of the m-values of the x sentences 
(T57-1v), which is x X m(Qa). But 7 is L-true (T31-2b); hence m(j) = 1. 
Therefore m(Qa) = 1/x. Hence the assertion. 

b. If ‘W’ is a molecular predicate with the width w, e(rf,M,K,t) = w/k. 

Proof. 1. Let w = o. Then ‘M’ is L-empty (T32-2a). Hence o is the only 
possible value for af(M,K). Therefore e(af) = o, and e(rf) = o (T1o04-4a). 
2. Let w> o. Then ‘M’ is L-equivalent to a disjunction of w Q-predicates 
(D32-1a). The latter are L-exclusive in pairs (T31-2a). Therefore af(M) is the 
sum of the af of the w Q’s. Hence likewise for rf and for e(rf) (To9-2). Hence 
the theorem with (a). 

These results can be made plausible by the following consideration. 
Consider first Tra. The evidence is the tautology ‘i’; that means that 
empirically nothing is known about the property Q. However, the logical 
character of Q is known; in particular, it is a Q-property. The rf of Q in a 
given class K is, of course, an empirical matter and hence unknown as 
long as no empirical evidence is available. And the same holds for the rf- 
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values of the other Q-properties. However, one thing is known about the 
rf-values of the Q-properties in K: their arithmetic mean is 1/x. This is 
not a factual matter but a logical necessity. Since the x Q-properties form 
a division (T31-2d), the sum of their rf-values is 1; hence the mean 
is 1/«. Thus the situation is this: empirically nothing is known about Q; 
but it is logically known that Q belongs to a certain class of properties of 
the same logical nature for which the mean rf is r/x. Therefore it seems 
not implausible that the estimate of the rf-of Q is this mean value 1/k. 
Tib says that the estimate of the rf of M on the null evidence is equal 
to the relative width of M. This result follows from Tia. But it can be 
made plausible also directly. Although the number of those molecular 
predicate expressions in a system & which have a given width w is in- 
finite, the number of properties expressed by them is finite, if we regard 
L-equivalent expressions as expressing the same property. [The properties 
of width w correspond to the possible selections of w among the « Q-prop- 
erties; therefore (T40-32d) their number is (%).] It can easily be shown 
that the mean rf of the properties with the width w is w/x. This holds 
always (technically speaking, it holds in every 3). Here again it seems 
plausible that the estimate of the rf of M is this mean value of rf. 


B. The Problem of the Null Evidence 


The results just stated (Tr) and discussed refer to the null evidence. 
Now some philosophers reject all inductive procedures based on the null 
evidence, They believe that nothing can be said about the c of a hypothesis 
or the estimate of a function as long as no empirical evidence is available. 
I think that in this extreme form the view is certainly wrong; it seems to 
me clear that at least comparative inductive judgments can be made with 
respect to the null evidence. I regard also numerical inductive judgments 
as possible; but I admit that their possibility is problematic. That is to 
say, if somebody believes that there is no adequate quantitative explica- 
tum for probability,, then he can maintain this belief consistently, and 
we cannot compel him to change his view in the same sense in which we 
can compel any consistent thinker to accept a deductive result. But if 
somebody accepts a quantitative concept ¢ for factual evidence, then there 
seem to be no good reasons for the rejection of the null evidence. To take 
an example, suppose that ‘M’ is a molecular predicate with the relative 
width 1/8 (for instance, ‘P; . P,.P;’), and that nothing is known about 
the distribution of M. Thus the rf of M in a given population may have 
any value between o and 1 (both included), and the same holds for the rf 
of non-M. If the rf of M isr, that of non-M is 1 — r. Can we say anything 
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more about the two values? It seems to me clear that we have more rea- 
son to expect that the rf of M is below that of non-M than to expect the 
converse, although either case is possible and neither is certain. And 
therefore, for any individual a, it seems to me clear that there is more rea- 
son to expect that a is non-M than that it is M. If we are willing to admit 
inductive reasoning at all, although its success cannot be deductively 
demonstrated, then we can hardly reject these comparative judgments. 
Now let us examine the question of numerical values for confirmation 
and estimation. For the sake of this discussion, let us assume the skeptical 
view (as held by Kries, Keynes, Nagel, and others) that in most cases 
comparative judgments at best are possible and that numerical values 
can be stated only in a special kind of case. Let us consider a case of this 
special kind. The following evidence e is given: a bag contains 8o balls of 
which 10 are white, and the ball b is now drawn at random from the bag. 
Let / be the hypothesis that b is white. I presume that this is a case where 
even most of the skeptics will admit not only the comparative judgment 
that the c for / on e is less than for non-A, but also the quantitative judg- 
ment that c(/,e) = 1/8. However, does the skeptic have any means to 
compel the ultra-skeptic, who admits even in this case only the com- 
parative judgment, to change his mind and to accept also the numerical 
judgment? He is certainly unable to, do so; but he can show that the nu- 
merical judgment is plausible and in accord with customary inductive 
thinking. Now consider again the predicate ‘M’ with the relative width 
1/8, With respect to the numerical judgment that ¢ for ‘Ma’ on the null 
evidence is 1/8 and that the estimate for the rf of M on the null evidence 
is 1/8, my relation to the skeptic who rejects numerical judgments on the 
null evidence is the same as the relation of the skeptic to the ultra-skeptic 
in the example of the ball. I cannot compel the skeptic, but I can give 
plausibility reasons which seem just as good as the reasons the skeptic 
gives to the ultra-skeptic. Since the mean rf of those properties which have 
the same width as M is 1/8 and that for non-M is 7/8, it seems plausible 
to say not only that the estimate for the rf of M must be smaller than 
that of non-M but also that it must be exactly one-seventh of it and hence 
must be 1/8. For the same reason, the only fair betting quotient on ‘Ma’ 
is 1/8, and hence the probability, of ‘Ma’ should be regarded as 1/8. 
When we say that, in our example, the estimate for the rf of M on the 
null evidence is 1/8, we do not mean to say that this is a good estimate. 
It means merely that this is the best estimate that can be made by an. 
observer X who has no factual evidence. But the best in this case is still 
not good. It must be admitted that this estimate is very unreliable be- 
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cause it has only very little support. If X had observed a very large sample 
in which M had the rf 1/8, he might perhaps find as estimate of the rf of M 
in any unobserved class again 1/8; the same value of the estimate would 
in this case have much more support and hence be more reliable. However, 
the fact that the estimate in the first case, om the null evidence, has only 
a low reliability, does not nullify the inductive validity of that estimate. 


C. Further Theorems on the Estimate of Relative Frequency 


Since the estimate of rf is equal to a certain value of ¢ (T106-1c), we can 
derive further theorems on e(rf) from earlier theorems on c (mostly from 
§ 59). 

T107-3. Let £, e, e, and K be as in T106-1. Let ¢ be a symmetrical 
c-function. Let ‘M’ and ‘M” be molecular predicates in l, and ‘b’ any 
new in in £. Then the following holds. 

a. If }e D Mb (hence, in particular, if ‘M’ is L-universal (D25-ra)), 

e(rf,M,K,e) = 1. (From Tx106-1c, T59-1b.) 

b. If pe D ~Mb (hence, in particular, if ‘M’ is L-empty (D25-r1b)), 

e(rf,M,K,e) = o. (From T106-1c, T59-1e.) 

c. General addition theorem. 

e(rf,M V M’,K,e) = e(rf,M,K,e) + e(rf,M’,K,e) — e(rf,M. M’,K,e). 
(From T106-1c, T59-1k.) 

+d. Special addition theorem. If ‘M’ and ‘M” are L-exclusive with respect 
to e (D25-3b), then i 
e(rf,M V M’,K,e) = elrf,M,K,e) + e(rf,M’,K,e). (From (c), (b); 
analogous to T59-1l.) 

e. e(rf,M . M',K,e) = e(rf,M,K,e) + e(rf,M’,K,e) — elrf,M V M’,K,e). 

(From (c); analogous to T59-1q.) 
f. Ife D MbV M’b (hence, in particular, if ‘M’ and ‘M” are L-dis- 
junct, in other words, ‘M V M” is L-universal), then 
e(rf,M . M',K,e) = e(rf,M,K,e) + e(rf,M’,K,e) — 1. (From (e), (a).) 
+g. Special multiplication theorem. If ‘Mb’ is irrelevant to ‘M’b’ on evi- 
dence e (D65-1d), then 
e(rf,M . M',K,e) = e(rf,M,K,e) X e(rf,M’,K,e). 
Proof. If the condition is fulfilled, c(Mb.M’b,e) = c(Mb,e) X c(M’b,e) 
(165-61, the special multiplication theorem for c). Hence the assertion with 
_T106-1c. 
+h. e(rf,~M,K,e) = 1 — e(rf,M,K,e). (From Tro6-1c, T59-1p.) 

i. If‘M’ L-implies ‘M”, then e(rf,M,K,e) < e(rf,M’,K,e). (From T106- 

1c, T59-2d.) 
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j. e(rf,M . M',K,e) < e(rf,M,K,e). (From (i); analogous to T59-2e.) 
k. e(rf,M,K,e) < e(rf,M V M’,K,e). (From (i); analogous to T59-2f.) 
+1. If ‘M7’ is a factual predicate (D25-r1c), in other words, ‘Mb’ is a factual 
sentence (T25-r1c), then (1) neither ‘Mb’ nor ‘~ Mb’ is L-implied 
by e, and (2) if £ is finite or e is nongeneral, then o < e(rf,M,K,e) 

EEE 

Proof. 1. ‘~Mb' is also factual (T20-6a). e is either L-true or factual. If e is 
L-true, it cannot L-imply either of the two factual sentences (T20-2c). If e is 
factual, (1) follows from Tar-r1¢. 2. ¢(Mb,e) is >o (from (1), Ts9-sd) and <1 
(from (1), T59-5a). Hence the assertion (2) with Tro6-1c. 

In most applications of predictive inference and predictive estimation 
the property M in question is factual, that is, neither L-universal nor L- 
empty (D25-1c); hence ‘Mb’ is a factual sentence. T3] says that in this 
case the predictive estimate of rf cannot be either o or r. This is quite 
plausible. Even if the finite sample described in e is very large and none of 
its individuals has been found to be M, the case that a finite class K of new 
individuals contains at least one element which is M is certainly not im- 
possible although it may be highly improbable. Therefore the probability , 
of this case, although small, cannot be o, because it is a possible case in a 
finite domain. Therefore the estimate of af, that is, the probability,- 
weighted mean, must be positive, and hence likewise the estimate of rf. A 
frequently used method of predictive estimation of frequencies is based 
on what might be called the straight rule of estimation; this rule determines 
the predictive estimate of rf as equal to the observedrf. Thus, if the observed 
sample does not contain any individual with M and hence the observed 
rf of M is o, the rule says that e(rf) = o; and if the observed rf is 1, the 
tule says that e(rf) = 1. T3l shows that these results are not in agreement 
with any c-mean estimate-function based on any symmetrical ¢-function. 
For this and other reasons it seems to me that the straight rule violates 
principles which seem to be generally accepted explicitly or implicitly in 
the customary ways of inductive thinking. A detailed critical analysis of 
the straight rule and related inductive methods will be given in a later 
chapter (in Vol. II). 


D. The Inverse Estimation of Frequency 


The inductive inference from a sample to the population was called 
inverse (or upward) inference (§ 44B). Similarly we shall call the estima- 
tion of a frequency in the population based on an observed sample inverse 
(or upward) estimation. It is easily seen that the inverse estimate of af for ` 
a finite population can be derived from the predictive estimate. In this 
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case we take as K not a limited second sample but the whole remainder, 
that is, the class of all individuals in the population not belonging to the 
first sample described in e. The af in the population is, of course, the sum 
of the af in the first sample and that in the remainder. The inverse esti- 
mate of rf is likewise determined by the given rf in the sample and the 
predictive estimate of rf in the remainder (since the cardinal numbers of 
the sample and the remainder are given by the definitions of the sample 
and the population). As explained earlier, the concept of rf, and hence also 
e(rf), can also be applied to an infinite class Ke, provided a fixed serial 
order for the individuals is established. In many statistical problems the 
population is either infinite or very large in comparison with the first 
sample; in cases of this kind the rf in the population is either exactly or 
approximately equal to that in the remainder. 

The inverse estimate of rf plays an important role in many theories of 
induction or statistics. For instance, Reichenbach’s rule of induction, if 
interpreted as a rule of estimation (cf. § 41E), applies only to this special 
case of estimation, and his whole theory of induction is based upon this 
rule. In mathematical statistics various methods have been developed for 
estimating the values of parameters characterizing a distribution of prop- 
erties or quantitative magnitudes in a population; and one of the simplest 
and most fundamental of these parameters is the rf of a property. The rf 
of a property in-the whole population, especially in the case of an infinite 
population, is called ‘probability’ both by Mises and Reichenbach and by 
contemporary statisticians. This is the second meaning of the term 
‘probability’ (in our terminology, probability,). The suitability of the use 
of the word in this sense might be debated, but the fact that a simple 
term has been chosen is a clear indication for the importance of the con- 
cept. The inverse estimate of rf is thus the estimate of probability.; in 
our method, this estimate is based upon probability,. 

Theorems on predictive and inverse estimation of rf, stating not only 
relations between values but the values themselves, will later (in Vol. II) 
be given on the basis of our functions c* and e*. 


APPENDIX 


§ 110. Outline of a Quantitative System of Inductive Logic 


This appendix gives a brief summary of the system of quantitative inductive 
logic to be constructed in Volume II. This system is the theory of a certain func- 
tion c* which is proposed as a quantitative explicatum of probability,. The 
definition of c* is here given, but, for the sake of brevity, not the reasons for its 
choice. Furthermore, a few theorems are stated without proofs. m* is defined 
in such a way that it is symmetrical and has equal values for all structure- 
descriptions. ¢* is the c-function based upon m* (A). Theorems concerning c* 
are stated for the principal kinds of inductive inference earlier explained 
(§ 44 B): the direct inference (B), the predictive inference (C), the inference 
by analogy (D), the inverse inference (E), and the universal inference (F). The 
latter concerns universal laws. In connection with it, the concept of the instance 
confirmation of a law is introduced as an explicatum for what a scientist or an 
engineer means when he says that a given law of physics is “reliable” or “well- 
founded” (G). It is shown that for predicting a future event on the basis of ob- 
servations made, it is not necessary to make use of laws; the prediction can be 
inductively inferred directly from the observations (H). The requirement of 
the variety of instances in testing a law is briefly discussed (I). The system of 
inductive logic here outlined is meant as a rational reconstruction and systema- 
tization of customary inductive reasoning (J). 


A. The Function c* 


In Volume II a quantitative system of inductive logic will be con- 
structed, based upon an explicit definition of a particular c-function c* and 
containing theorems concerning the various kinds of inductive inference 
and especially of statistical inference in terms of c*. In the present appen- 
dix we shall indicate the definition of c* and briefly mention a few of the 
theorems on c*, omitting proofs and technical details. 

Space does here not permit an explanation of the reasons for choosing 
just the function c* out of an infinite number of c-functions. The reasons 
are chiefly negative, in the following sense. A critical examination of vari- 
ous quantitative inductive methods which have been proposed from the 
classical period to our time will be given in Volume II. These methods in- 
clude both those for the calculation of numerical values of probability, 
and those for the calculation of estimates of parameters characterizing a 
population (inverse estimation) or a second sample (predictive estimation) 
on the basis of a given sample, especially estimates of relative frequency. 
The examination will refer only to those methods which are not merely 
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constructed for a special kind of case but have a general character. It will 
be shown that the general methods to be examined, from Laplace’s rule 
of succession down to R. A. Fisher’s method of maximum likelihood, show 
certain disadvantages. Each of these methods leads in certain cases to 
numerical values which are not in accord with the implicit principles of 
customary inductive thinking as represented by the judgments of good 
scientists or careful, rational bettors. This result does not exclude the 
possibility that any of these methods may be adequate and useful within 
an extensive field; this is certainly the case, e.g., with Fisher’s method 
mentioned above. Now the chief arguments in favor of the function c*, 
though there are also a few others of a more positive nature, will consist 
in showing that this function is free of the inadequacies found in the other 
methods. It may then still be inadequate in other respects. It will not be 
claimed that c* is a perfectly adequate explicatum for probability,, let 
alone that it is the only adequate one. For the time being it would be suff- 
cient that c* be a better explicatum than the previous methods (if indeed 
it is); in the future still better explicata may be found. 


From the preceding remarks it seems clear that a criticism of the system 
based on c* would hardly be useful at the present time, that is, before the publi- 
cation of the full explanation of this system, if it would merely raise the ob- 
jection that the basis of the system seems arbitrary, that is to say, that the 
reasons for the choice of c* as an explicatum are not clear. On the other hand, 
if it could be shown that another method, for instance, a new definition for 
degree of confirmation, leads in certain cases to numerical values more adequate 
than those furnished by c*, that would constitute an important criticism. Or, if 
someone, even without offering an explicatum, were to show that any adequate 
explicatum must fulfil a certain requirement and that c* does not fulfil it, it 
might be a helpful first step toward a better solution. 


We take as the basis of our system of inductive logic that m-function 
m* which fulfils the following two conditions: 
(1) a. m* is a symmetrical m-function (Dgo-r and 2). 
b. m* has the same value for all Str (structure-descriptions, D27-1) 
in Ly. 


* It is easily seen that there is exactly one m-function for Qy which fulfils 


these two conditions and that it is the function m* defined as follows: 
(2) Let 3; be any 3 (state-description, D18-1a) in Qy. Let r be the 
number of Gtr in £y and ¢; the number of those 3 in &y which 
are isomorphic to 3; (§§ 26 f.). Then we define: 
m*(8:) =p: 1/rhs. 
This defines m* as an m-function for the 3 in Qy. We extend it to the sen- 
tences of £y by our earlier procedure (D55-2). 
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The function m* thus defined does indeed fulfil the conditions (z)(a) 
and (b). 

Proof. It follows from (2) that all 3 isomorphic to 3; have the saine m*- 
value; hence (1)(a) is fulfilled. Let Str; be the Str corresponding to 3; (D27-1a). 
Then R(Str;) is the class of those 3 which are isomorphic to 3; (T27-2f). 
Therefore m*(Str;) = {m*(3,) = 1/7. Since this holds for every Str, (1)(b) 
is fulfilled. 

c* is then defined as the c-function based upon m* (D55-3). All these 
definitions refer to y. The functions m* and c* for Qe are then defined 
by our earlier limit-procedure (D56-1 and 2). ¢* is our concept of degree of 
confirmation, that is, the concept which we propose as a quantitative ex- 
plicatum for probability, in application to our systems £. Our system of 
inductive logic is the theory of c*. 

It seems to me that there are good and even compelling reasons for the 
stipulation (1)(a), i.e., the choice of a symmetrical function. The proposal 
of any nonsymmetrical ¢-function as degree of confirmation could hardly 
be regarded as acceptable; this was shown by earlier explanations (§ go). 
The same cannot be said, however, for the stipulation (1)(b). No doubt, to 
the way of thinking which was customary in the classical period of the theo- 
ry of probability, (1)(b) would appear as validated, like (x)(a), by the prin- 
ciple of indifference. However, to modern, more critical thought, this 
mode of reasoning appears as invalid because the structure-descriptions 
(in contradistinction to the individual constants) are by no means alike 
in their logical features but show very conspicuous differences. The defi- 
nition of c* shows a great simplicity in comparison with other functions 
which might be taken into consideration. Although this fact may influence 
our decision to choose c*, it cannot, of course, be regarded as a sufficient 
reason for this choice. It’ seems to me that the choice of c* cannot be 
justified by any features of the definition which are immediately recog- 
nizable, but only by studying the consequences to which the definition 
leads and especially by comparing them with the consequences of other 
definitions. This will be done in Volume II. 

There is another c-function cf which at first glance appears not less 
plausible than c*. The choice of this function may be suggested by the 
following consideration. Prior to experience, there seems to be no reason 
to regard one 3 as less probable than another. Accordingly, it might seem 
natural to assign equal m-values to the 3 instead of to the Gtr. This is 
done in the following definition of mt for the 3: 


(3) mt(3,) = 1/f, 
where ¢ is the number of the 3 in &y. This is still simpler than the defini- 
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tion (2) for m*, The measures ascribed to the ranges are here simply 
` taken as proportional to the cardinal numbers of the ranges. mf is then 
extended to sentences as before, and c} is defined as the c-function based 
upon mj. Earlier authors have often discussed the problem whether the 
principle of indifference should be applied to individual distributions 
(often called ‘constitutions’) or to statistical distributions (frequencies), 
in other words, whether the former should be regarded as a priori equi- 
probable or the latter. It will be shown in Volume II that both sides were . 
wrong. The principle of the equiprobability of individual distributions, if 
applied to the whole universe, would lead to the equiprobability of all 3, 
and hence to mj and c}. This principle has been accepted by some promi- 
nent writers, among them C. S. Peirce ([Theory], see [Papers], II, 470 f.), 
Keynes ([Probab.], pp. 56 f.), Wittgenstein ([Tractatus] *5.15). However, 
in spite of its apparent plausibility, the functon cf can easily be seen to be 
entirely inadequate as a concept of degree of confirmation. As an example, 
consider the system 10, with ‘P’ as the only pr. Let e be the conjunction 
‘Pa, » Pd,» Paz.. . . a Pa: and let k be ‘Payor’. Then e.k is a 3 and 
hence m}(e . k) = 1/¢. e holds only in the two 3, e . h and e. ~h; hence 
mt(e) = 2/f. Therefore c}(k,e) = 1/2. If e’ is formed from e by replacing 
some or even all of the atomic sentences with their negations, we obtain 
likewise c{(4,e’) = 1/2. Thus the c{-value for the prediction that ao; is P 
is always the same, no matter whether among the hundred observed in- 
dividuals the number of those which have been found to be P is 100 or 50 
or o or any other number. Thus the choice of cf as the degree of confirma- 
tion would be tantamount to the principle never to let our past experi- 
ences influence our expectations for the future. This would obviously be 
in striking contradiction to the basic principle of all inductive reasoning. 

The second of the two controversial principles, which declares statisti- 
cal distributions as equiprobable, is likewise wrong. In the general form 
in which it is usually stated it leads to contradictions. It is, however, con- 
sistent if it is applied only to the Q-division. In this case it asserts the 
equiprobability of the Str, which correspond to the statistical distribu- 
tions for the Q-division (T34-6), and thus leads to m* and c*. 

The preceding considerations show that the following argument, ad- 
mittedly not a strong one, can be offered in favor of m*. Of the two m- 
functions which are most simple and suggest themselves as the most natu- 
ral ones, m* is the only one which is not entirely inadequate. 

The definitions of m* and c* indicated above are formulated in a general 
way so as to apply to all our systems £. But the greater part of our system 
of inductive logic, including all theorems mentioned in the remainder of 
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this section, will be restricted to the systems £" which contain only pr of 
degree one (§ 31). This restriction to properties is customary in theories ` 
on probability,. An extension of this part of inductive logic to relations 
would require certain results in the deductive logic of relations, results 
which this discipline, although widely developed in other respects, has not 
yet reached (for example, an answer to the apparently simple question 
as to the number of structures in a given finite language system). We shall 
make use of the following concepts earlier explained in connection with 
the systems £": the Q-properties (§ 31); the number z of the pr; the num- 
ber « of the Q-properties, which is 2* (T31-r); the width w of a property 
and its relative width w/x (D32-1); the Q-numbers (§ 34). Here for j— 
in contradistinction to systems containing relations—it is easy to state 
explicit functions for 7 (T35-1d) and ¢; for a 8: with the Q-numbers N,, 
Na, . . . , N, (135-4). Substituting in (2) the values given by the theorems 
just mentioned, we obtain: 

sta) — Ni!Na!... N.I — 1)! 
(4) m*(3i) = io 2 
This result serves as a basis for all further theorems. 

Let j be a nongeneral sentence in &%j. The application of (4) to all 3 in 
R; furnishes an effective procedure for the computation of m*(j). How- 
ever, since the number of 3 becomes very large even for small systems 
(see T35-2), this procedure, although effective, is impracticable, that is, 
too lengthy for practical purposes. Another procedure for the computation 
of m*(j) which is practicable if the number of in in jis not too large will 
be explained in Volume II. : 

“m*(#) in lo is defined by a limit (Ds6-1). The question arises under 
what conditions this limit exists. We have to distinguish two cases. (i) Sup- 
pose that 7 is nongeneral. Here the situation is simple; it can be shown that 
in this case m*(ż) is the same in all finite systems in which 7 occurs; hence 
it has the same value also in fo. (ii) Let ¢ be general. Here the situation is 
quite different. For a given £y, i can of course easily be transformed into 
an L-equivalent sentence iy without variables (T22-3). The values of 
m*(iķ) are in general different for each N; and although the simplified 
procedure mentioned above is available for the computation of these 
values, this procedure becomes impracticable even for moderate N. Thus 
for general sentences the problem of the existence and the practical com- 
putability of the limit becomes serious. It can be shown that for every 
general sentence the limit exists; hence m* has a value for all sentences 
in le. Moreover, an effective procedure for the computation of m*(é) for 
any sentence 7 in £e has been constructed. This is based on a procedure 
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for transforming any given general sentence 7 into a nongeneral sentence 
i’ such that 7 and 7’, although not necessarily L-equivalent, have the same 
m*-value in £» and 7’ does not contain more in than i; this procedure is 
not only effective but also practicable for sentences of customary length. 
Thus, the computation of m*(z) for a general sentence 7 is in fact much 
simpler for Qo than for a finite system fy with a large N. 

With the help of the procedure mentioned, the following theorem is 
obtained: $ 


(5) If ¿ is a purely general sentence (D16-6g) in 2%, then m*(i) is 
either o or 1, 


For a sentence of the form ‘(«)(Mx)’, where ‘M’ is a factual molecular 
predicate, m* is o. This leads to the later result (12). 


B. The Direct Inference 


One of the most important tasks of inductive logic is to furnish general 
theorems on the degree of confirmation in the various cases called kinds 
of inductive inference which were earlier explained (§ 44B). We shall now 
indicate some results of this kind concerning c*. Inductive inferences are 
of special importance when they become statistical inferences, that is to 
say, when e or h or both give statistical information, e.g., concerning the 
absolute or relative frequencies of given properties in a population or a 
sample. Most of the subsequent theorems are of this kind. 

The direct inference is the inference from the population to a sample. 
Here it is not necessary to state special theorems on c* because the theo- 
rems stated earlier for all symmetrical c-functions (§§ 94-96) hold, of 
course, for c* too. We have seen that these theorems state the same values 
as classical theorems, including the binomial law (§ 95) and the various 
parts of Bernoulli’s theorém (§ 96), although the restricting conditions 
and the interpretations are somewhat different in some cases. 


C. The Predictive Inference 


The predictive inference is the inference from one sample to another. 
Let the properties M; (i = 1 to p) form a division (D25-4). Let e be an 
individual distribution (D26-6a) saying that in a first sample of s indi- 
viduals s; specified ones are M; (i = 1 to p); let e’ be the statistical dis- 
tribution corresponding to e (D26-6b); let k be a statistical distribution 
(D26-6c) for the same division, but for a second sample of s’ other indi- 
viduals with the cardinal numbers s¢; let the width of M; be w;. Then 
the following holds: 
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The predictive inference is the most important kind of inductive infer- 
ence. The other kinds which will be discussed here may be construed as 
special cases of the predictive inference. Therefore the further theorems in 
this section can be derived from (6). 

The most important special case of the predictive inference is the singu- 
lar predictive inference. Here h is a singular prediction ‘Mc’ (with ‘M’ for 
‘M,’), where ‘c’ is an in not occurring in e. e and e’ are as before. In this case 


* z Re fy ass + wu, 
(7) che) = he) = EM, 
Laplace’s much-debated rule of succession gives in this case simply the 


value #3 for any property whatever; this, however, if applied to dif- 


ferent properties, leads to contradictions. Other authors state the value 
$:/s, that is, they take simply the observed relative frequency as the 
probability for the prediction that an unobserved individual has the prop- 
erty in question. We call this the straight rule. This rule, however, leads to 
quite implausible results. If sı = s, e.g., if three individuals have been 
observed and all of them have been found to be M, the last-mentioned 
rule gives the probability for the next individual being M as 1, which 
"seems hardly acceptable (see § 47 A). According to (7), c* is influenced by 
the following two factors (though not uniquely determined by them): 


(i) w,/k, the relative width of M; 
(ii) s,/s, the relative frequency of M in the observed sample. 


The factor (i) is purely logical; it is determined by the semantical rules. 
(ii) is empirical; it is determined by observing and counting the individuals ~ 
in the sample. The value of c* always lies between those of (i) and (ii). 
Before any individual has been observed, c* is equal to the logical fac- 
tor (i). As we first begin to observe a sample, c* is influenced more by this 
factor than by (ii). As the sample is increased by observing more and more 
individuals (but not including the one mentioned in 4), the empirical fac- 
tor (ii) gains more and more influence upon c*. Let us assume that, as the 
sample increases, the relative frequency of M continues to have the same: 

value r = s,/s. In this case c* moves slowly toward this value r, which it — 
approaches as a limit; when the sample is sufficiently large, c* is practically 
equal to r. This result seems more adequate than the value c = s,/s of the 
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straight rule; but the latter is acceptable as an approximation in the case 
of sufficiently large samples. 

According to our previous analysis of estimation (T106-1c), the value 
of c* stated in (7) is likewise the estimate of the relative frequency of M 
within any unobserved class. 


D. The Inference by Analogy 


Here the situation is as follows. The evidence known to us is the fact 
that individuals b and c agree in certain properties and, in addition, that 
b has a further property; thereupon we consider the hypothesis that ¢ too 
has this property. Logicians have always felt that a peculiar difficulty is 
here involved. It seems plausible to assume that the probability of the 
hypothesis is the higher the more properties b and c are known to have in 
common; on the other hand, it is felt that these common properties 
should not simply be counted but weighed in some way. This becomes pos- 
sible with the help of the concept of width. Let M, be the conjunction of all 
properties which b and ¢ are known to have in common. The known simi- 
larity between b and c is the greater the stronger the property M,, hence 
the smaller its width. Let M, be the conjunction of all properties which b 
is known to have. Let the width of M, be w,, and that of M,, wa. Accord- 
ing to the above description of the situation, we presuppose that M, L- 
implies M, but is not L-equivalent to M,; hence w: > wa. Now we take 
as evidence the conjunction e . j; e says that b is M, and j says that c is 
M.. The hypothesis + says that c has not only the properties ascribed to it 
in the evidence but also the one (or several) ascribed in the evidence to b 
only, in other words, that c has all known properties of b, or briefly that c 
is Ma. Then 
(8) enj) = + ; 
jand k speak only about c; e introduces the other individual b which serves 
to connect the known properties of c expressed by j with its unknown prop- 
erties expressed by /. The chief question is whether the degree of confirma- 
tion of / is increased by the analogy between c and b, in other words, by 
the addition of e to our knowledge j. An affirmative answer to this ques- 
tion can be derived from (8). However, the increase of c* is under ordinary 
conditions rather smäll; this is in agreement with the general conception 
according to which reasoning by analogy, although admissible, can usually 
yield only rather weak results. 

Neither the classical theory nor modern theories of probability have 
been able to give a satisfactory account of and justification for the infer- 


570 APPENDIX 


ence by analogy. This fact is not surprising since the degree of confirma- 
tion depends here not on relative frequencies but entirely on the widths 
of the properties involved, thus on magnitudes neglected by both classi- 
cal and modern theories. 


E. The Inverse Inference 


The inverse inference is the inference from a sample to the whole popu- 
lation. This inference can be regarded as a special case of the predictive 
inference with the second sample covering the whole remainder of the 
population. Let M, e, s, Si e’, and w; be as under (C). Let k be a statisti- 
cal distribution which says that in the whole population of n individuals, 
of which the sample described in e is a part, there are n; individuals with 


M; (i = 1 to p). Then 
: Le bi 
©) (he) = (he) = MGa) 
( n—s ) 


This theorem shows that in the inverse inference, in distinction to the 
direct inference, c* is dependent not only upon the frequencies but also 
upon the widths of the properties. 


F. The Universal Inference 


The universal inference is the inference from an observed sample to a 
hypothesis of universal form. Let / be a factual sentence of the form 
‘(x)(Mx > M’x)’, where ‘M’ and ‘M” are factual molecular predicates. 
Hence / is an unrestricted simple law (D37-2a). As an example, let ‘M’ 
designate the property Swan and ‘M” White; hence / says that all swans 
are white. Let. us take ‘M,’ as an abbreviation for ‘M.~M” (Non- 
White Swan), and let the width of ‘M, be w;. Then / is L-equivalent to 
‘(x)(~ Mx)” (‘there are no non-white swans’) (T21-5g(z)) and hence is 
a law with the strength w, (T37-3a). Let e be a conjunction of s full sen- 
tences of ‘~M,’ with s different in. Thus e describes a sample of s individu- 
als none of which violates the law Z. Then, for any finite system £}, 


( +k- ‘) 
(10) Me) = (Wey 
W 
In the special case of a system containing ‘M,’ as the only pr, we have 
w: = rand x = 2, and hence c*(J,e) = (s + 1)/(N + 1). The latter value 
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is given by some authors as holding generally (see Jeffreys [Probab.], 
p. 106 (16)). However, it seems plausible that the degree of confirmation 
must be smaller for a stronger law and hence depend upon w,. 

If s is very large in relation to x, the following approximation holds: 


(11) i c* (le) = a 
For g, 
(12) c*(Le) = 0. 


These theorems show that for finite systems the confirmation of the law / 
decreases with increasing W. This seems plausible because, the larger N is, 
the more is asserted by l. If N is very large, c* becomes very small; and for 
the infinite system it is o. The latter result may seem surprising; it seems 
not in accord with the fact that scientists often say of a law that it is 
“well-confirmed”’; this problem will be discussed under (G). 

Let us now consider the case where also negative instances are ob- 
served. Let e’ be an individual distribution for ‘M, and ‘~M,’ with s in 
with the cardinal numbers s, and s — sı. Thus e’ says that the observed 
sample of s individuals contains s, negative instances (non-white swans). 
Obviously, in this case there is no point in taking as hypothesis the law 
in its original form /, because e’ and / are L-exclusive. We take instead the 
corresponding restricted law (D37-2b) } which says that all individuals 
not belonging to the sample described in e’ have the property ~M, (‘all 
unobserved swans are white’). Then for ty: 

( +x- ‘) 
GS Ce) EL 

( Si + w: ) 
This shows that c*(/’,e’) decreases with an increase of NV and even more 
with an increase in the number s, of violating cases. It can be shown that, 
under ordinary circumstances with large W, c* increases moderately when 
a new individual is observed which satisfies the original law 7. On the 
other hand, if the new individual violates /, c* decreases very much, its 
value becoming a small fraction of its previous value. This seems in good 
agreement with the general conception. 

For the infinite system, c* is again o, as in the previous case. 


G. The Instance Confirmation of a Law 


Suppose we ask an engineer who is building a bridge why he has chosen 
the particular design. He will refer to certain physical laws and tell us 
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that he regards them as “very reliable”, “well founded”, “amply con- 
firmed by numerous experiences”. What do these phrases mean? It is clear 
that they are intended to say something about probability, or degree of 
confirmation. Hence, what is meant could be formulated more explicitly 
in a statement of the form ‘c(/,e) is high’ or the like. Here the evidence e is 
obviously the relevant observational knowledge. But what is to serve as 
the hypothesis 4? One might perhaps think at first that / is the law in 

. question, hence a universal sentence / of the form: ‘For every space-time 
point x, if such and such conditions are fulfilled at x, then such and such 
is the case at x’. I think, however, that the engineer is chiefly interested 
not in this sentence /, which speaks about an immense number, perhaps an 
infinite number, of instances dispersed through all time and space, but 
rather in one instance of / or a relatively small number of instances. When 
he says that the law is very reliable, he does not mean to say that he is 
willing to bet that among the billion of billions, or an infinite number, of 
instances to which the law applies there is not one counterinstance, but 
merely that this bridge will not be a counterinstance, or that among all 
bridges which he will construct during his lifetime there will be no counter- 
instance. Thus + is not the law / itself but only a prediction concerning 
one instance or a relatively small number of instances. Therefore, what 
is vaguely called the reliability of a law is measured not by the degree of 
confirmation of the law itself but by that of one or several instances. This 
suggests the subsequent definitions. They refer, for the sake of simplicity, 
to just one instance; the case of several, say, one hundred, instances can 
then easily be judged likewise. Let e be any non-L-false, nongeneral sen- 
tence. Let / be a simple law (D37-1) of the form (i,)(M,). Then we un- 
derstand by the instance confirmation of l on the evidence e, in symbols 
‘*(/,e)’, the degree of confirmation, on the evidence e, of the hypothesis 
that a new individual not mentioned in e fulfils the law /: 


(14) cf (Le) =n c*(h,e) , 


where & is an instance of Dt; formed by substituting for i, an in not oc- 
curring in e. 

The second concept, now to be defined, seems in many cases to repre- 
sent still more accurately what is vaguely meant by the reliability of a 
law /. We suppose here that } has the frequently used conditional form 
mentioned earlier: (x)(Mx D M’x)’ (e.g., ‘all swans are white’). By the 
qualified-instance confirmation of the law that all swans are white we 
mean the degrec of confirmation for the hypothesis %’ that the next swan 
to be observed will likewise be white. The difference between the hy- 
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pothesis % used previously for the instance confirmation and the hypoth- 
esis h’ just described consists in the fact that the latter concerns an indi- 
vidual which is already qualified as fulfilling the condition M. That is 
the reason why we speak here of the qualified-instance confirmation, in 
symbols ‘cai’: 


(15) AM, M,e) =p: c*(h',e«j) , 


where 7 is a full sentence of ‘M’ with an in not occurring in e, and ķ’ is the 
full sentence of ‘M” with the same in. 

We shall now give two theorems concerning the concepts just defined. 
Let l be ‘(x)(Mx D M’x)’. Let ‘M? be defined, as earlier, by ‘M.~M” 
(Non-White Swan) and ‘M,’ by ‘M . M” (White Swan). Let the widths 
of ‘M, and ‘M? be w, and w,, respectively. Let e be a report about s ob- 
served individuals saying that s, of them are M, (negative cases) and s, 
are Ma, while the remaining ones are ~M (Non-Swan) and hence neither 
M, nor M,. Then the following holds: 


+ Sb a Sane 
(16) a(lje) = 1 SERY 
* N a Sit tr 
(17) Call Mie) SH w Hs Hw 


The values of the two functions stated by these theorems are independ- 
ent of V and hold therefore for all finite and infinite systems. The values 
for the case that the observed sample does not contain any individuals 
violating the law / can easily be obtained from the values stated by tak- 
ing s: = 

It can A shown that, if the number s: of observed negative cases is 
either o or a fixed small number, then, with the increase of the sample 
size s, both c* and că grow close to 1, in contradistinction to c* for the 
law itself. This justifies the customary manner of speaking of “very re- 
liable” or “well-founded” or ‘“‘well-confirmed” laws, provided we interpret 
these phrases as referring to a high value of either of our two concepts just 
introduced. Understood in this sense, the phrases are not in contradiction 
to the previous results that the c* of a law is very small in a large system 
and o in the infinite system. 

These concepts will also be of help in situations of the following kind. 
Suppose an observer X has observed certain events and finds two L-ex- 
clusive laws, each of which would explain the observed events satisfac- 
torily. Which of them should he prefer? With respect to a finite system, 
he may take the law with the higher c. With respect to the infinite system, 
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however, this method of comparison fails, because for either law ¢ = o 
(in the case of c* and similar functions). Here the concept of instance con- 
firmation (or that of qualified-instance confirmation) will help. If it has 
a higher value for one of the two laws, then this law will be preferable, if 
no reasons of another nature are against it. 


H. Are Laws Needed for Making Predictions? 


The expectations of future events which people actually entertain are 
influenced not only by rational factors but also by irrational ones like 
wishful thinking or fear. In a rational procedure, expectations should 
somehow be “founded upon” or “inductively inferred from” past ex- 
periences, in some sense of those phrases. In order to see more clearly how 
this is to be done and, in particular, which part in this procedure is played 
by laws, let us consider the following simplified schema. Suppose that X 
wants to determine, either for practical purposes of everyday life or for 
theoretical purposes of science, whether it would be reasonable for him to 
expect that a given individual c is M’ in view of his earlier experiences, 
Let h be this prediction ‘M’c’. Suppose that his relevant observational re- 
sults are as follows: (1) Many other things were M and all of them were 
also M’; let this be formulated in the sentence e; (2) cis M; let this be j. 
Thus he knows e and j by observation. How does he proceed from these 
premises to the desired conclusion /? It is clear that this cannot be done 
by deduction; an inductive procedure must be applied. What is this in- 
ductive procedure? It is usually explained in the following way. From the 
evidence e, X infers inductively the law / which says that all M are M’; this 
inference: is supposed to be inductively valid because e contains many 
positive and no negative instances of the law l; then he infers k (‘c is 
white’) from / (‘all swans are white’) and j (‘c is a swan’) deductively. 
Now let us see how the procedure appears from the point of view of our 
inductive logic. One might perhaps be tempted to transcribe the usual de- 
scription of the procedure just given into technical terms as follows. X in- 
fers } from e inductively because ¢(J,e) is high; since /.7 L-implies k, 
c(h,e « j) is likewise high; thus & may be inductively inferred from e.j. 
However, this way of reasoning would not be correct, because, under ordi- 
nary conditions, c(/,e) (at least for c* and similar functions) is not high but 
very low, and even o if the number of individuals is infinite. The difficulty 
disappears when we realize, on the basis of our previous discussions, that 
X does not need a high c* for / in order to obtain the desired high c* for h; 
all he needs is a high c for J; and this he has by knowing e and 7. Thus we 
see that X need not take the roundabout way through the law / at all, 


§ 110. OUTLINE OF INDUCTIVE LOGIC 575 


as is usually believed; he can instead go from his observational knowledge 
e „j directly to the singular prediction k. That is to say, our inductive 
logic makes it possible to determine c*(%,e « j) directly and to find that 
it has a high value, without making use of any law. Customary thinking 
in everyday life likewise often takes this short cut, which is now justified 
by inductive logic. For instance, suppose somebody asks X what he ex- 
pects to be the color of the next swan he will see. Then X may reason 
like this: he has seen many white swans and no non-white swans; there- 
fore he presumes, admittedly not with certainty, that the next swan will 
likewise be white; and he is willing to bet on it. Perhaps he does not even 
consider the question whether all swans in the universe without. a single 
exception are white; and, if he did, he would not be willing to bet on the 
affirmative answer. 

We see that the use of laws is not indispensable for making predictions. 
Nevertheless it is expedient, of course, to state universal laws in books on 
physics, biology, psychology, etc. Although these laws stated by scientists 
do not have a high degree of confirmation, they have a high qualified- 
instance confirmation and thus serve as efficient instruments for finding 
those highly confirmed singular predictions which are needed in practical 
life. 


I, The Variety of Instances 


A generally accepted rule of scientific method says that for testing a 
given law we should choose a variety of specimens as great as possible 
(§ 47E). Suppose that one physicist tests the law } by making experi- 
ments with one hundred specimens, all of the same kind, and finds all re- 
sults positive. Suppose that another physicist does the same with one 
hundred specimens taken from various kinds and finds likewise positive 
results. Let e, express the common prior knowledge of both physicists and 
the results of the hundred experiments of the first physicist; let e+ be the 
corresponding statement for the second physicist. Then we should say 
that the second physicist has made a more thoroughgoing examination of 
the law and therefore has more reason than the first to believe in the law / 
or in the prediction h of a future instance of the law. Therefore, we should 
require of an adequate explicatum c that its value for / or for 4 be higher on 
e, than on e;. Ernest Nagel ([Principles], pp. 68-71) has discussed this 
problem in detail and explained the difficulties involved in finding a con- 
cept of degree of confirmation that would satisfy the requirement; he ex- 
presses.doubts as to whether such a concept can be found at all. However, 
it can be shown that c* satisfies the requirement (see [Inductive] § 15). 
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J. Inductive Logic as a Rational Reconstruction 


Sometimes a theory is offered as a “rational reconstruction” of a body 
of generally accepted but more-or-less vague beliefs. This means that the 
theory introduces explicata for the concepts involved in those beliefs and 
that the content of the beliefs is represented in a more exact and more 
systematic form by statements of the theory. The demand for a justifica- 
tion of a theory proposed as a rational reconstruction may be understood 
in two different ways. (1) The first, more modest: task is to validate the 
claim that the new theory is a satisfactory reconstruction of the beliefs in 
question. It must be shown that the statements of the theory are in suffi- 
cient agreement with those beliefs; this comparison is possible only on 
those points where the beliefs are sufficiently precise. The question 
whether the given beliefs are true or false is here not even raised. (2) The 
second task is to show the validity of the new theory and thereby of the 
given beliefs. This is a much deeper-going and often much more difficult 
problem. 

For example, Euclid’s axiom system of geometry was a rational recon- 
struction of the beliefs concerning spatial relations which were generally 
held, based on experience and intuition, and applied in the practices of 
measuring, surveying, building, etc. Euclid’s axiom system was accepted 
because it was in sufficient agreement with those beliefs and gave a more 
exact and consistent formulation for them. In other words, it was a ration- 
al reconstruction and systematization. A critical investigation of the 
validity, the factual truth, of the axioms and the beliefs was not made 
until more than two thousand years later by Gauss and Einstein, 

The system of inductive logic here proposed, that is, the theory of c* 
based on the definition of this function, is intended as a reconstruction 
restricted to a simple language form, of inductive thinking as customarily 
applied in everyday life and in science. However, it is meant not merely as 
an uncritical representation of customary ways of thinking with all their 
defects and inconsistencies, but rather as a rational, critically corrected 
reconstruction. It is intended to lead to results which are more systema- 
tized, more consistent, and in certain points more correct than customary 
ways of thinking. One method of inductive thinking is regarded as more 
correct or more reasonable than another one if it is in better accord with 
the basic principle of inductive reasoning, which says that expectations 
for the future should be guided by the experiences of the past. More spe- 
cifically: what has been observed more frequently should, under otherwise 
equal conditions, be regarded as more probable for the future. 

Since the implicit rules of customary inductive thinking are rather 
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vague, any rational reconstruction contains statements which are neither 
supported nor rejected by the ways of customary thinking. Therefore, a 
comparison is possible only on those points where the procedures of cus- 
tomary inductive thinking are precise enough. It seems to me that on these 
points the theory of c* is sufficiently in agreement with customary induc- 
tive thinking to be regarded as an adequate reconstruction. This agree- 


` ment is found in many theorems, of which a few have been indicated in 


this appendix. And it seems further that in the points where there is a 
divergence, the theory of c* is more correct than customary thinking in 
the sense just explained. 


GLOSSARY 


Brief explanations are given for the main terms used in this volume. The exact 
definitions are stated in the body of the book (sometimes here referred to by their 
number or section); here we give only rough indications to help the reader’s memory. 
Sometimes two explanations are given separated by a solidus ‘/’; in such cases the first 
holds for the explicandum, the second for the explicatum proposed in this book. 

* A starred word is explained elsewhere in this glossary. 


A 


Absolute frequency (af) of the property M in the class K: the number of those ele- 
ments of K which have the property M (D1o4-1a), 

Additional evidence: If to the available *evidence e an additional evidence ż is added, 
with regard to a *hypothesis %, then we call e the prior evidence, e «7 the posterior 
evidence, c(,e) the prior confirmation of h, c(h,e « i) the posterior confirmation of h, 
c(i,e) the expectedness of i, c(i,e » 4) the likelihood of i (§ 60). i is said to be pre- 
dictable if it follows from e«h (§ 61); c(h,e « i)/c(h,e) is called the relevance quo- 
tient (D66-ra). 

Almost L-false sentence i: 7 is not *L-false, but m(i) = o (D58-1b, Ts8-3a). 

Almost L-true sentence i: i is not *L-true, but m(#) = 1 (Ds58-ra). 

Atomic sentence: consisting of a *primitive predicate and one or more *individual 
constants (D16-6a). 

Attribute: property or relation. 

B 


(cis) based upon m: c(h,e) = m(e« h)/m(e) (Ds5-3). 

Basic pair: consisting of an *atomic sentence and its *negation (D16-6c). 
Basic sentence: *atomic sentence or *negation of such (D16-6b). 
Biconditional: ¿ = 7; i if and only if j (§ 15A). 


c 


c-function: any numerical function considered as an *explicatum for *probability,. 

¢-mean estimate of an unknown magnitude: the weighted mean of the possible values 
of the magnitude with *degree of confirmation as weight (i.e., the sum of the pos- 
sible values each multiplied with the *degree of confirmation for its occurrence) 
(§§ 99, 100A). 

Classificatory concept: a property or relation of a simple kind, neither *comparative 
nor *quantitative (§ 4). 

Classificatory concept of confirmation: (€(h,i,e)): the *additional evidence ż is con- 
firming evidence for the *hypothesis & (on the basis of the *prior evidence e) 
($§ 8, 86). 

Comparative concept: a relation characterizing a thing in comparison with another 
thing in terms of ‘more’ (or ‘more or equal’) without using numerical values (e.g., 
‘x is warmer than y’) (§ 4). 5 

Comparative concept of confirmation (MC(k,e,k’,e')): the *hypothesis h is confirmed 
by the *evidence e more strongly or equally strongly as h’ by e’ (§§ 8, 79, 81). 

Concept: property, relation, or function (§ 3). j 

Conditional: Dj; if ¢ then 7 (§ r5A). 
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Conjunction: 7.7; 7 and 7 (§ 15A). 

Connectives: the following signs: ‘~’ (*negation), ‘V’ (*disjunction), ‘.’ (*conjunc- 
tion), ‘D’ (*conditional), ‘=’ (*biconditional) (§ 15A). + 

Correlation of the in: a one-one relation among all *individual constants (D26-r). 


D 


Deductive inference: inference based upon *L-implication. 

Deductive logic: the theory of logical deduction / the theory of *L-implication, 
*L-truth, etc. (§§ 20, 43B). 

Degree of confirmation (c(/,¢)): a *quantitative concept representing the degree to 
which the assumption of the *hypothesis h is supported by the *evidence e (§§ 8 
to 10A). 

Direct inference: the *inductive inference from the *population to a *sample (§ 44B). 

Disjunction: Vj; i or j (or both) (§ 15A). 

Division: an exhaustive and nonoverlapping set of properties (or *predicates designat- 
ing them) (D25-4). 

E 

Empty class = *Null class. 

Empty property M (or *predicate ‘M’): not holding for any *individual (D25-1b). 

Estimate} see c-mean estimate. 

Evidence: (a sentence expressing) the knowledge (usually results of observations) 
available to the observer and used by him as a basis for determining the “degree of 
confirmation of a *hypothesis or an *estimate (§§ 8, 10A). 

Existential quantifier: ‘(Ax)’, ‘there is an *individual x such that . . .’ (§ 15A). 

Existential sentence: consisting of an *existential quantifier and a *matrix as its *scope 
(e.g., (x) Px’) ($$ 15A, 16). 

Expectedness; see Additional evidence. 

Explication: the introduction of a new, exact *concept (the explicatum) to take the 
place of a given inexact *concept (the explicandum) (§ 2). 

Expression: a finite sequence of *signs (§ 14). 


F 


F-false sentence: *factual and false (§ 20). 
F-true sentence: *factual and true (§ 20). 
Factual predicate ‘M’ (or property M): ‘Ma’ is a *factual sentence (§ 25). 
Factual sentence: contingent, synthetic / neither *L-true nor *L-false (§ 20). 
Frequency; sce Absolute frequency; Relative frequency. 
Frequency concept of probability; see Probability.. 
Full matrix of ‘M’: a *matrix consisting of ‘M’ and *individual signs, e.g., ‘Mx’ (§ 25). 
Full sentence of ‘M’: a sentence consisting of ‘M’ and *individual constants, e.g., 
‘Ma’ (§ 25). 
G 


General sentence: containing *variables (D16-6f). 
H 


Hypothesis: a sentence concerning unknown facts (e.g., a prediction or a law) which 
is judged on the basis of given *evidence (§§ 8, 10A). 


I 


Identity sentence: ‘a = b’, ‘a is the same “individual as b’ (§ 1 5A). 
in: *individual constant. 
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Individual constants (in): names of *individuals, ‘a’, ‘b, etc. (§ 15A). 

Individual distribution for a given *division and n given *individual constants: a sen- 
tence specifying for each of the m *individuals to which kind in the *division it be- 
longs / a *conjunction of  *full sentences of the *predicates of the *division with 

. one each of the given *individual constants (D26-6a). 

Individual sign: *individual constant or *individual variable. 

Individual variables: the variables ‘x’, ‘y’, etc., whose values are the *individuals; they 
are used in *quantifiers (§ 15A). 

Individuals: the things or events or positions which constitute the universe of dis- 
course (§ 15A). 

Inductive inference: an inference which is nondeductive, nondemonstrative / determi- 
nation of the *degree of confirmation e(/,e) (in particular, when e *L-implies neither 
h nor ~h) (§ 44B). 

Inductive logic: theory of the (*classificatory, *comparative, or *quantitative) con- 
cept of confirmation (§§ 8, 10A, 43). 

Inference; see Deductive inference; Inductive inference. 

Initial confirmation = *Null confirmation. 

Initially relevant: *relevant with respect to the *tautologous evidence (D65-2). 

Inverse inference: the *inductive inference from a *sample to the *population (§ 44B). 

iis irrelevant to h on e: the *degree of confirmation of % remains unchanged when is 
added to e (D65-1d). 

Isomorphic sentences é, j: j is formed from i by replacing each *individual constant oc- 
curring in ¢ by its correlate with respect to a *correlation of the in (D26-3a). 


L 
L-disjunct sentences i, j: the “disjunction i V j is *L-true (§ 20). 
L-equivalent sentences ż, j: i and j have the same content; they entail each other logi- 
cally / è and j have the same *range (§ 20). 
L-exclusive sentences i, j: i is logically incompatible with j / the *conjunction i » j is 
*L-false (§ 20). \ 
L-false sentence i: 7 is logically false, self-contradictory / the *range of i is *empty, 
t holds in no *state-description (§ 20). 
t L-implies j; 7 logically implies j; j follows logically from i / the *range of i is con- 
tained in that of j (§ 20). - 
L-true sentence i (} 2): 7 is logically true, analytic / i has the universal *range, i holds 
in every *state-description (§ 20). 
Law; see Simple law and § 37. 
Likelihood} see Additional evidence. 
Logic; see Deductive logic; Inductive logic. 
Logical width; see Width. 
M 


m-function: a measure function assigning numerical values first to the *state-descrip- 
ta then to all sentences, representing an *explicatum for *probability, a priori 

` ($ 55A). 

Matrix, sentential: sentence or sentence-like expression with free *variables, e.g., 
‘~PxV Rax (§ 15A). 

Metalanguage: the language in which we make statements about the symbolic *object 
language; the metalanguage used in this book is the English word-language supple- 
mented with some technical signs (e.g., ‘m’, ‘, ‘pr’, @, P’, etc.) (§ 14). 

Molecular predicate: a *predicate introduced as an abbreviation for a *molecular 
predicate expression (§ 25). 

Molecular predicate expression: consisting of *primitive predicates and *connectives, 
eg., ‘~P: V P? (§ 25). 
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Molecular property: designated by a *molecular predicate expression (§ 25). 
Molecular sentence: consisting of *atomic sentences and *connectives (D16-6e). 


N 

Negation: ~i; not-i (§ 15A). 

iis negatively relevant (or negative) to # on e: the *degree of confirmation of h is 
decreased when i is added to e (D65-1b). 

Nongeneral sentence: not containing *variables (D16-6g). 

Null class: class to which no element belongs. 

Null confirmation: *degree of confirmation before any *factual *evidence is avail- 
able / c(#,f), where ‘’ is *tautologous (§ 57B). 


o 


Object language: the language investigated (not used) in a certain context; in this 

book, the symbolic systems £. 1 
P. 

Population: the class of *individuals studied in a given investigation (§ 44B). 

i is positively relevant (or positive) to # on e: the *degree of confirmation of / is in- 
creased by the addition of i to e (D65-1a). 

Posterior confirmation, Posterior evidence; see Additional evidence. 

pr: *primitive predicate. 

Predicate: a *sign designating an *attribute, e.g., ‘P’, ‘M’ (§ 15A). 

Predicate expression: an *expression designating an *attribute (§ 25). 

Predictable; see Additional evidence. 

Predictive inference: the *inductive inference from one *sample to another *sample 
(§ 44B). 

Primitive predicates: the undefined *predicates of a language system, e.g., ‘Py’, Pa, 
etc., ‘Ry’, etc. (§ 15A). 

Prior confirmation, Prior evidence; see Additional evidence. 

Probability,: the logical concept of probability, *degree of confirmation (§ 9). 

Probability: the statistical concept of probability, *relative frequency in the long 
run (§ 9). 

Psychologism: wrong interpretation of logical problems in psychological terms 
(§§ 11, 12). 

Q 

Q-predicates: the *predicates designating *Q-properties, ‘Q,’, ‘Q,’, etc. (§ 31). 

Q-properties: the strongest *factual properties in a language system (§ 31). = 

Quantifier; see Existential quantifier; Universal quantifier. 

Quantitative concept: function with numerical values (§ 4). * 

Quantitative concept of confirmation: *degree of confirmation with numerical values 
(§8). 

R 

Range of a sentence i: the class of *state descriptions in which 7 holds (§ 18D). 

Regular c-function: any *c-function fulfilling certain plausible conventions (§ 53) / a 
*c-function *based on a *regular m-function (Ds5-4). 

Regular m-function: any *m-function fulfilling certain plausible conventions (§ 53) / 
any *m-function whose values for *state-descriptions are positive numbers whose 
sum is 1 (§ 55A). 

Relative frequency of the property M in the class K: the *absolute frequency of M 
in K divided by the cardinal number of K (D104-rb). 

Relative width of ‘M’: the *width of ‘M’ divided by the number of the *Q-predicates 


(K) (D32-1b). 
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Relevance quotient; see Additional evidence. 
Relevant: either *positively relevant or *negatively relevant (D6s5-rc). 


S 


Sample: a subclass (usually described in the *evidence) of the *population (§ 44B). 

Scope of a *quantifier: the *matrix following it. 

‘Semantics: the analytic theory of meaning (designation), truth, logical deduction 
(*L-implication), etc. (§ 8). 

Signs: the smallest units of which the *expressions of the *object language consist, 
e.g., ‘P’, ‘x’, ‘V’, etc. (D16-1). 

Simple law: a *universal sentence whose *scope contains no *quantifier (e.g., 
“@)(Pix D Pax)’) (D37-1). 

Singular sentence, hypothesis: concerning one *individual (D16-6i). 

Singular predictive inference: *predictive inference with a *singular hypothesis 
($ 44B). 

State-description (3): a sentence (or class of sentences) describing completely a pos- 
sible state of affairs of the universe of discourse / a *conjunction (or class of sen- 
tences) containing as components (or elements) one sentence out of each *basic 
pair (D18-1). 

Statistical distribution for a given *division and x given *individual constants: a sen- 
tence which states how many (but not, which) of the n given *individuals belong 
to each of the kinds in the *division / a *disjunction of all *individual distributions 
which are *isomorphic to a given one (D26-6c). i 

Statistical inference: *inductive inference involving *frequencies (§ 44B). 

Structure-description (Str): a sentence which states for each *Q-property how many 
(but not, which) of the *individuals belong to it / a *disjunction of all *state-de- 
scriptions which are *isomorphic to a given one (D27-1). 

Symmetrical c-function: a *c-function treating all *individuals on a par / a *c-function 
*based upon a *symmetrical m-function (§§ 90, 91). 

Symmetrical m-function: an *m-function treating all *individuals on a par / an *m- 
function which has equal values for *isomorphic *state-descriptions (§ 90). 


T 
Tautologous: *L-true (in propositional logic), e.g., ‘Pa V ~ Pa’. 
U 


Universal inference: the *inductive inference from a *sample to a *universal sentence 
as *hypothesis (§ 44B). 
Universal property M (or predicate ‘M’): holding for every *individual (D25-1a). 
Universal quantifier: ‘(x)’, ‘for every *individual x, .. 2’ (§ 15A). 
Universal sentence: consisting of a *universal quantifier and a *matrix as its *scope 
(e.g., ‘(a)(Pix D Psx)’) (§§ 15A, 16). 
v 


Variable: all variables in our systems are *individual variables. 


w 


Width of the *molecular predicate ‘M’ is w: M is a *disjunction of w *Q-properties 
(D32-1a). 


3: *state-description. 
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Mathematical expectation, 525, 528 ff. 
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Mode, 523 
*Molecular predicate, 105 
*Molecular predicate expression, 104 f. 
*Molecular property, 105 
*Molecular sentence, 67 
Molina, E. C., 583, 593 
Morgenstern, O., 268, 593 


Morris, C., 55 

N 
N, 59 
Naess, A., 8 


Nagel, E., 24, 220, 230, 234, 429 f., 558, 575, 
™ 593, 595 
Natural number, 62 
*Negation, 61 
*Negatively relevant, 347 f. 
Nelson, E. J., 593, 595 
Neumann, J. von, 268, 593 
Neurath, O., 587 
Neyman, J., 28, 508, 516 f., 518, 593 
Nicod, J., 469 f., 593 
Nisbet, R. H., 593 
Nitsche, A., 554, 593 
*Nongeneral sentence, 67 
Normal form, 94 ff. 
Normal function, 153 f., 504 


INDEX 


Normal law, 504 ff. 
Northrop, F. S. C., 16, 593 

*Null confirmation, 289, 307 ff. 
Null evidence, 557 f. 
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= degree of confirmation, 25 ff. 
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limit definition, 187, 252, 552 f. 
recent origin, 183 ff. 
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Q, 124 f. . 
Q-division, 126 
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Ramsey, F. P., 36, 45 f., 594 
Random order, 28, 495 
Random sample, 493 f. 
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Requirement of completeness, 74 ff. 
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rf; see Relative frequency 
Risk function, 517 
Ritchie, A. D., 595 
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Sass, L. D., sor 
Schematization, 209 ff. 
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INDEX 


Square error, 537 
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frequency conception of probab., 24, 187 
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Subjective concept, 238 
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T 
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probability integral, 154 
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Tautologicalevidence; see Null confirmation 
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tf; see Truth-frequency 
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Wright, G. H. von, 342, 583, 598 


$ 
Yule, G. U., 598 


Zz 


3; see State-description 
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In the object language (systems £): 


~ V, n61 

2, =,órf. 

a, 60, 62 

=, 61 

¥, 62 

In the metalanguage: 
BOY, = bb = pe Os 57 
nl, 149 

=, 150 

(2), 150 f. 

[3], 152 x 

¢ (normal function), 153 f. 
 (probab. integral), 153 f. 
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